Remove Special Characters For A Pattern Java - java

I want to remove that characters from a String:
+ - ! ( ) { } [ ] ^ ~ : \
also I want to remove them:
/*
*/
&&
||
I mean that I will not remove & or | I will remove them if the second character follows the first one (/* */ && ||)
How can I do that efficiently and fast at Java?
Example:
a:b+c1|x||c*(?)
will be:
abc1|xc*?

This can be done via a long, but actually very simple regex.
String aString = "a:b+c1|x||c*(?)";
String sanitizedString = aString.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(sanitizedString);

I think that the java.lang.String.replaceAll(String regex, String replacement) is all you need:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String).

there is two way to do that :
1)
ArrayList<String> arrayList = new ArrayList<String>();
arrayList.add("+");
arrayList.add("-");
arrayList.add("||");
arrayList.add("&&");
arrayList.add("(");
arrayList.add(")");
arrayList.add("{");
arrayList.add("}");
arrayList.add("[");
arrayList.add("]");
arrayList.add("~");
arrayList.add("^");
arrayList.add(":");
arrayList.add("/");
arrayList.add("/*");
arrayList.add("*/");
String string = "a:b+c1|x||c*(?)";
for (int i = 0; i < arrayList.size(); i++) {
if (string.contains(arrayList.get(i)));
string=string.replace(arrayList.get(i), "");
}
System.out.println(string);
2)
String string = "a:b+c1|x||c*(?)";
string = string.replaceAll("[+\\-!(){}\\[\\]^~:\\\\]|/\\*|\\*/|&&|\\|\\|", "");
System.out.println(string);

Thomas wrote on How to remove special characters from a string?:
That depends on what you define as special characters, but try
replaceAll(...):
String result = yourString.replaceAll("[-+.^:,]","");
Note that the ^ character must not be the first one in the list, since
you'd then either have to escape it or it would mean "any but these
characters".
Another note: the - character needs to be the first or last one on the
list, otherwise you'd have to escape it or it would define a range (
e.g. :-, would mean "all characters in the range : to ,).
So, in order to keep consistency and not depend on character
positioning, you might want to escape all those characters that have a
special meaning in regular expressions (the following list is not
complete, so be aware of other characters like (, {, $ etc.):
String result = yourString.replaceAll("[\\-\\+\\.\\^:,]","");
If you want to get rid of all punctuation and symbols, try this regex:
\p{P}\p{S} (keep in mind that in Java strings you'd have to escape
back slashes: "\p{P}\p{S}").
A third way could be something like this, if you can exactly define
what should be left in your string:
String result = yourString.replaceAll("[^\\w\\s]","");
Here's less restrictive alternative to the "define allowed characters"
approach, as suggested by Ray:
String result = yourString.replaceAll("[^\\p{L}\\p{Z}]","");
The regex matches everything that is not a letter in any language and
not a separator (whitespace, linebreak etc.). Note that you can't use
[\P{L}\P{Z}] (upper case P means not having that property), since that
would mean "everything that is not a letter or not whitespace", which
almost matches everything, since letters are not whitespace and vice
versa.

Related

How can I remove whitespaces around the first occurrence of specific char?

How can I remove the whitespaces before and after a specific char? I want also to remove the whitespaces only around the first occurrence of the specific char. In the examples below, I want to remove the whitespaces before and after the first occurrence of =.
For example for those strings:
something = is equal to = something
something = is equal to = something
something =is equal to = something
I need to have this result:
something=is equal to = something
Is there any regular expression that I can use or should I check for the index of the first occurrence of the char =?
private String removeLeadingAndTrailingWhitespaceOfFirstEqualsSign(String s1) {
return s1.replaceFirst("\\s*=\\s*", "=");
}
Notice this matches all whitespace including tabs and new lines, not just space.
You can use the regular expression \w*\s*=\s* to get all matches. From there call trim on the first index in the array of matches.
Regex demo.
Yes - you can create a Regex that matches optional whitespace followed by your pattern followed by optional whitepace, and then replace the first instance.
public static String replaceFirst(final String toMatch, final String forIP) {
// string you want to match before and after
final String quoted = Pattern.quote(toMatch);
final Pattern patt = Pattern.compile("\\s*" + quoted + "\\s*");
final Matcher match = patt.matcher(forIP);
return match.replaceFirst(toMatch);
}
For your inputs this gives the expected result - assuming toMatch is =. It also works with arbitrary bigger things - eg.. imagine giving "is equal to" instead ... getting
something =is equal to= something
For the simple case you can ignore the quoting, for an arbitrary case it helps (although as
many contributors have pointed out before the Pattern.quoting isn't good for every case).
The simple case thus becomes
return forIP.replaceFirst("\\s*" + forIP + "\\s*", forIP);
OR
return forIP.replaceFirst("\\s*=\\s*", "=");

having trouble with arrays and maybe split

String realstring = "&&&.&&&&";
Double value = 555.55555;
String[] arraystring = realstring.split(".");
String stringvalue = String.valueof(value);
String [] valuearrayed = stringvalue.split(".");
System.out.println(arraystring[0]);
Sorry if it looks bad. Rewrote on my phone. I keep getting ArrayIndexOutOfBoundsException: 0 at the System.out.println. I have looked and can't figure it out. Thanks for the help.
split() takes a regexp as argument, not a literal string. You have to escape the dot:
string.split("\\.");
or
string.split(Pattern.quote("."));
Or you could also simply use indexOf('.') and substring() to get the two parts of your string.
And if the goal is to get the integer part of a double, you could also simply use
long truncated = (long) doubleValue;
split uses regex as parameter and in regex . means "any character except line separators", so you could expect that "a.bc".split(".") would create array of empty strings like ["","","","",""]. Only reason it is not happening is because (from split javadoc)
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
so because all strings are empty you get empty array (and that is because you see ArrayIndexOutOfBoundsException).
To turn off removal mechanism you would have to use split(regex, limit) version with negative limit.
To split on . literal you need to escape it with \. (which in Java needs to be written as "\\." because \ is also Strings metacharacter) or [.] or other regex mechanism.
Dot (.) is a special character so you need to escape it.
String realstring = "&&&.&&&&";
String[] partsOfString = realstring.split("\\.");
String part1 = partsOfString[0];
String part2 = partsOfString[1];
System.out.println(part1);
this will print expected result of
&&&
Its also handy to test if given string contains this character. You can do this by doing :
if (string.contains(".")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain .");
}

the best way for character replacement in String in java

I want to check a string for each character I replace it with other characters or keep it in the string. and also because it's a long string the time to do this task is so important. what is the best way of these, or any better idea?
for all of them I append the result to an StringBuilder.
check all of the characters with a for and charAt commands.
use switch like the previous way.
use replaceAll twice.
and if one of the first to methods is better is there any way to check a character with a group of characters, like :
if (st.charAt(i)=='a'..'z') ....
Edit:
please tell the less consuming in time way and tell the reason.I know all of these ways you said!
If you want to replace a single character (or a single sequence), use replace(), as other answers have suggested.
If you want to replace several characters (e.g., 'a', 'b', and 'c') with a single substitute character or character sequence (e.g., "X"), you should use a regular expression replace:
String result = original.replaceAll("[abc]", "X");
If you want to replace several characters, each with a different replacement (e.g., 'a' with 'A', 'b' with 'B'), then looping through the string yourself and building the result in a StringBuilder will probably be the most efficient. This is because, as you point out in your question, you will be going through the string only once.
String sb = new StringBuilder();
String targets = "abc";
String replacements = "ABC";
for (int i = 0; i < result.length; ++i) {
char c = original.charAt(i);
int loc = targets.indexOf(c);
sb.append(loc >= 0 ? replacements.charAt(loc) : c);
}
String result = sb.toString();
Check the documentation and find some good methods:
char from = 'a';
char to = 'b';
str = str.replace(from, to);
String replaceSample = "This String replace Example shows
how to replace one char from String";
String newString = replaceSample.replace('r', 't');
Output: This Stting teplace Example shows how to teplace one chat ftom Stting
Also, you could use contains:
str1.toLowerCase().contains(str2.toLowerCase())
To check if the substring str2 exists in str1
Edit.
Just read that the String come from a file. You can use Regex for this. That would be the best method.
http://docs.oracle.com/javase/tutorial/essential/regex/literals.html
This is your comment:
I want to replace all of the uppercases to lower cases and replace all
of the characters except a-z with space.
You can do it like this:
str = str.toLowerCase().replaceAll("[^a-z]", " ");
Your requirement should be part of the question, not in comment #7 under a posted answer...
You should look into regex for Java. You can match an entire set of characters. Strings have several functions: replace, replaceAll, and match, which you may find useful here.
You can match the set of alphanumeric, for instance, using [a-zA-Z], which may be what you're looking for.

how could I split the string into vallid substrings regardless of uncertain length blank inside

For example: after execution, the output of the String "hello world yo" and "hello   world  yo" should be strictly the same.
what's more, the output should be a String[] in which:
String[0] == "hello"; String[1] == "world"; String[2] == "yo";
so that other method can deal with the effective words latter.
I was thinking about String.split(" "), but the blanks between the words are uncertain, and will then cause an exception..
You can use
String.split("\\s+") // one or more whitespace.
Dont use == for string comaprision instead use String.equals()
Edit for question in comment
what's the notation called? what if there is one or more "_" or "\n" ?
As you can see String#split() API accepts regex as parameter. The \s is shorthand character class for whitespace, whereas + is used to repeats the previous item once or more.
Now if you want to split String on
_ ie. underscore --> "this__is_test".split("[_]+");
\n ie. newline --> "this__is_test\n new line".split("\\r?\\n");
Regex Tutorial
You can split on "\\s+". That splits on one or more whitespace characters.
String.split() takes a regexp, so you can simply do String.split(" +").
I think the split function takes regex, but if it doesn't then the below works.
The regex in this might not be right, but it demonstrates the concept of what you're trying to do.
Pattern p = Pattern.compile("(.*?) *(.*)");
Matcher m = p.matcher(s);
if (m.matches()) {
String name = m.group(1);
String value = m.groupo(2);
}
for (int i = 1; i<=m.groupCount(); i++) {
System.out.println(m.group(i));
}
you can use Regular Expression to split the String.
String.split(Regular Expression);
for multiple whitespace, you can use Regular Expression: " \\s+ ", which 's' stand for space.
"==" operator used to judge whether left and right is equal. for String, they are Object actually, which means that they are regard as reference(like the pointer in C).
So if you want compare the content of two Strings, you can use method equals(String) of String.
e.g. str1.equals(str2)

Java split a CSV ignoring HTML characteres

I need to split a string by semicolon ignoring the semicolons that may come as HTML characters.
For instance, given the string:
id=com.google.android;keywords=Android;Operating System;Phone;versions=Gingerbread;ICS;JB
I need to split it into:
id = com.google.android
keywords=Android;Operating System;Phone
versions=Gingerbread;ICS;JB
any ideia how to do this?
A regex like (?<!&#?[0-9a-zA-Z]+); would probably do it. This would prevent matching a semicolon that terminates an entity reference or character reference, though it also catches a few cases that are not technically either by the specs (e.g. it wouldn't match the semicolon at the end of &#foo; or &123;).
(?<!...) is a "negative lookbehind", so you can read this regex as matching a semicolon that is not preceded by a substring that matches &#?[0-9a-zA-Z]+ (i.e. ampersand, optional hash, and one or more alphanumerics). However lookbehinds must have an upper bound on the number of characters they can match, which + doesn't, so you'll have to use a bounded repetition count, like {1,5} rather than the unbounded +. The upper bound needs to be at least as long as the longest entity reference you might see, and if your data might contain arbitrary entity references then you'll have to use something like the length of the string as the upper bound.
String[] keyValuePairs = theString.split(
"(?<!&#?[0-9a-zA-Z]{1," + theString.length() + "});");
If you can specify a smaller bound then that would probably be more efficient.
Edit: Android apparently doesn't like this lookbehind, even with bounded repetition, so you probably won't be able to use a single regex with String.split to do what you're after, you'll have to do the looping yourself, e.g.
Pattern p = Pattern.compile("(?:&#?[0-9a-zA-Z]+)?;");
Matcher m = p.matcher(theString);
List<String> splits = new ArrayList<String>();
int lastEltStart = 0;
while(m.find()) {
if(m.end() - m.start() > 1) {
// this match was an entity/character reference so don't split here
continue;
}
if(m.start() > lastEltStart) {
// non-empty part
splits.add(theString.substring(lastEltStart, m.start()));
}
lastEltStart = m.end();
}
if(lastEltStart < theString.length()) {
// non-empty final part
splits.add(theString.substring(lastEltStart));
}
Since the HTML entites have only two or three numbers between the '&#' and ';' I used the following regex:
(?<!&#\d{2,3});

Categories