How to remove invalid characters from a string?

How to remove invalid characters from a string? - java

I have no idea how to remove invalid characters from a string in Java. I'm trying to remove all the characters that are not numbers, letters, or ( ) [ ] . How can I do this?
Thanks

String foo = "this is a thing with & in it";
foo = foo.replaceAll("[^A-Za-z0-9()\\[\\]]", "");
Javadocs are your friend. Regular expressions are also your friend.
Edit:
That being siad, this is only for the Latin alphabet; you can adjust accordingly. \\w can be used for a-zA-Z to denote a "word" character if that works for your case though it includes _.

Using Guava, and almost certainly more efficient (and more readable) than regexes:
CharMatcher desired = CharMatcher.JAVA_DIGIT
.or(CharMatcher.JAVA_LETTER)
.or(CharMatcher.anyOf("()[]"))
.precomputed(); // optional, may improve performance, YMMV
return desired.retainFrom(string);

Try this:
String s = "123abc&^%[]()";
s = s.replaceAll("[^A-Za-z0-9()\\[\\]]", "");
System.out.println(s);
The above will remove characters "&^%" in the sample string, leaving in s only "123abc[]()".

public static void main(String[] args) {
String c = "hjdg$h&jk8^i0ssh6+/?:().,+-#";
System.out.println(c);
Pattern pt = Pattern.compile("[^a-zA-Z0-9/?:().,'+/-]");
Matcher match = pt.matcher(c);
if (!match.matches()) {
c = c.replaceAll(pt.pattern(), "");
}
System.out.println(c);
}

Use this code:
String s = "Test[]"
s = s.replaceAll("[");
s = s.replaceAll("]");

myString.replaceAll("[^\\w\\[\\]\\(\\)]", "");
replaceAll method takes a regex as first parameter and replaces all matches in string. This regex matches all characters which are not digit, letter or underscore (\\w) and braces you need (\\[\\]\\(\\)])

You can remove specials characters from your String/Url or any request parameters you have get from user side
public static String removeSpecialCharacters(String inputString){
final String[] metaCharacters = {"../","\\..","\\~","~/","~"};
String outputString="";
for (int i = 0 ; i < metaCharacters.length ; i++){
if(inputString.contains(metaCharacters[i])){
outputString = inputString.replace(metaCharacters[i],"");
inputString = outputString;
}else{
outputString = inputString;
}
}
return outputString;
}

You can specify the range of characters to keep/remove based on the order of characters in the ASCII table. The regex can use actual characters or character hex codes:
// Example - remove characters outside of the range of "space to tilde".
// 1) using characters
someString.replaceAll("[^ -~]", "");
// 2) using hex codes for "space" and "tilde"
someString.replaceAll("[^\\u0020-\\u007E]", "");

Related

How to remove leading 0 in the time timestamp 02:25PM using java? [duplicate]

I've seen questions on how to prefix zeros here in SO. But not the other way!
Can you guys suggest me how to remove the leading zeros in alphanumeric text? Are there any built-in APIs or do I need to write a method to trim the leading zeros?
Example:
01234 converts to 1234
0001234a converts to 1234a
001234-a converts to 1234-a
101234 remains as 101234
2509398 remains as 2509398
123z remains as 123z
000002829839 converts to 2829839

Regex is the best tool for the job; what it should be depends on the problem specification. The following removes leading zeroes, but leaves one if necessary (i.e. it wouldn't just turn "0" to a blank string).
s.replaceFirst("^0+(?!$)", "")
The ^ anchor will make sure that the 0+ being matched is at the beginning of the input. The (?!$) negative lookahead ensures that not the entire string will be matched.
Test harness:
String[] in = {
"01234", // "[1234]"
"0001234a", // "[1234a]"
"101234", // "[101234]"
"000002829839", // "[2829839]"
"0", // "[0]"
"0000000", // "[0]"
"0000009", // "[9]"
"000000z", // "[z]"
"000000.z", // "[.z]"
};
for (String s : in) {
System.out.println("[" + s.replaceFirst("^0+(?!$)", "") + "]");
}
See also
regular-expressions.info
repetitions, lookarounds, and anchors
String.replaceFirst(String regex)

You can use the StringUtils class from Apache Commons Lang like this:
StringUtils.stripStart(yourString,"0");

If you are using Kotlin This is the only code that you need:
yourString.trimStart('0')

How about the regex way:
String s = "001234-a";
s = s.replaceFirst ("^0*", "");
The ^ anchors to the start of the string (I'm assuming from context your strings are not multi-line here, otherwise you may need to look into \A for start of input rather than start of line). The 0* means zero or more 0 characters (you could use 0+ as well). The replaceFirst just replaces all those 0 characters at the start with nothing.
And if, like Vadzim, your definition of leading zeros doesn't include turning "0" (or "000" or similar strings) into an empty string (a rational enough expectation), simply put it back if necessary:
String s = "00000000";
s = s.replaceFirst ("^0*", "");
if (s.isEmpty()) s = "0";

A clear way without any need of regExp and any external libraries.
public static String trimLeadingZeros(String source) {
for (int i = 0; i < source.length(); ++i) {
char c = source.charAt(i);
if (c != '0') {
return source.substring(i);
}
}
return ""; // or return "0";
}

To go with thelost's Apache Commons answer: using guava-libraries (Google's general-purpose Java utility library which I would argue should now be on the classpath of any non-trivial Java project), this would use CharMatcher:
CharMatcher.is('0').trimLeadingFrom(inputString);

You could just do:
String s = Integer.valueOf("0001007").toString();

Use this:
String x = "00123".replaceAll("^0*", ""); // -> 123

Use Apache Commons StringUtils class:
StringUtils.strip(String str, String stripChars);

Using Regexp with groups:
Pattern pattern = Pattern.compile("(0*)(.*)");
String result = "";
Matcher matcher = pattern.matcher(content);
if (matcher.matches())
{
// first group contains 0, second group the remaining characters
// 000abcd - > 000, abcd
result = matcher.group(2);
}
return result;

Using regex as some of the answers suggest is a good way to do that. If you don't want to use regex then you can use this code:
String s = "00a0a121";
while(s.length()>0 && s.charAt(0)=='0')
{
s = s.substring(1);
}

If you (like me) need to remove all the leading zeros from each "word" in a string, you can modify #polygenelubricants' answer to the following:
String s = "003 d0g 00ss 00 0 00";
s.replaceAll("\\b0+(?!\\b)", "");
which results in:
3 d0g ss 0 0 0

I think that it is so easy to do that. You can just loop over the string from the start and removing zeros until you found a not zero char.
int lastLeadZeroIndex = 0;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == '0') {
lastLeadZeroIndex = i;
} else {
break;
}
}
str = str.subString(lastLeadZeroIndex+1, str.length());

Without using Regex or substring() function on String which will be inefficient -
public static String removeZero(String str){
StringBuffer sb = new StringBuffer(str);
while (sb.length()>1 && sb.charAt(0) == '0')
sb.deleteCharAt(0);
return sb.toString(); // return in String
}

Using kotlin it is easy
value.trimStart('0')

You could replace "^0*(.*)" to "$1" with regex

String s="0000000000046457657772752256266542=56256010000085100000";
String removeString="";
for(int i =0;i<s.length();i++){
if(s.charAt(i)=='0')
removeString=removeString+"0";
else
break;
}
System.out.println("original string - "+s);
System.out.println("after removing 0's -"+s.replaceFirst(removeString,""));

If you don't want to use regex or external library.
You can do with "for":
String input="0000008008451"
String output = input.trim();
for( ;output.length() > 1 && output.charAt(0) == '0'; output = output.substring(1));
System.out.println(output);//8008451

I made some benchmark tests and found, that the fastest way (by far) is this solution:
private static String removeLeadingZeros(String s) {
try {
Integer intVal = Integer.parseInt(s);
s = intVal.toString();
} catch (Exception ex) {
// whatever
}
return s;
}
Especially regular expressions are very slow in a long iteration. (I needed to find out the fastest way for a batchjob.)

And what about just searching for the first non-zero character?
[1-9]\d+
This regex finds the first digit between 1 and 9 followed by any number of digits, so for "00012345" it returns "12345".
It can be easily adapted for alphanumeric strings.

How to split a string in JAVA with two different seperators? [duplicate]

I want to split the string "004-034556" into two strings by the delimiter "-":
part1 = "004";
part2 = "034556";
That means the first string will contain the characters before '-', and the second string will contain the characters after '-'.
I also want to check if the string has '-' in it.

Use the appropriately named method String#split().
String string = "004-034556";
String[] parts = string.split("-");
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556
Note that split's argument is assumed to be a regular expression, so remember to escape special characters if necessary.
there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
For instance, to split on a period/dot . (which means "any character" in regex), use either backslash \ to escape the individual special character like so split("\\."), or use character class [] to represent literal character(s) like so split("[.]"), or use Pattern#quote() to escape the entire string like so split(Pattern.quote(".")).
String[] parts = string.split(Pattern.quote(".")); // Split on the exact string.
To test beforehand if the string contains certain character(s), just use String#contains().
if (string.contains("-")) {
// Split it.
} else {
throw new IllegalArgumentException("String " + string + " does not contain -");
}
Note, this does not take a regular expression. For that, use String#matches() instead.
If you'd like to retain the split character in the resulting parts, then make use of positive lookaround. In case you want to have the split character to end up in left hand side, use positive lookbehind by prefixing ?<= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?<=-)");
String part1 = parts[0]; // 004-
String part2 = parts[1]; // 034556
In case you want to have the split character to end up in right hand side, use positive lookahead by prefixing ?= group on the pattern.
String string = "004-034556";
String[] parts = string.split("(?=-)");
String part1 = parts[0]; // 004
String part2 = parts[1]; // -034556
If you'd like to limit the number of resulting parts, then you can supply the desired number as 2nd argument of split() method.
String string = "004-034556-42";
String[] parts = string.split("-", 2);
String part1 = parts[0]; // 004
String part2 = parts[1]; // 034556-42

An alternative to processing the string directly would be to use a regular expression with capturing groups. This has the advantage that it makes it straightforward to imply more sophisticated constraints on the input. For example, the following splits the string into two parts, and ensures that both consist only of digits:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class SplitExample
{
private static Pattern twopart = Pattern.compile("(\\d+)-(\\d+)");
public static void checkString(String s)
{
Matcher m = twopart.matcher(s);
if (m.matches()) {
System.out.println(s + " matches; first part is " + m.group(1) +
", second part is " + m.group(2) + ".");
} else {
System.out.println(s + " does not match.");
}
}
public static void main(String[] args) {
checkString("123-4567");
checkString("foo-bar");
checkString("123-");
checkString("-4567");
checkString("123-4567-890");
}
}
As the pattern is fixed in this instance, it can be compiled in advance and stored as a static member (initialised at class load time in the example). The regular expression is:
(\d+)-(\d+)
The parentheses denote the capturing groups; the string that matched that part of the regexp can be accessed by the Match.group() method, as shown. The \d matches and single decimal digit, and the + means "match one or more of the previous expression). The - has no special meaning, so just matches that character in the input. Note that you need to double-escape the backslashes when writing this as a Java string. Some other examples:
([A-Z]+)-([A-Z]+) // Each part consists of only capital letters
([^-]+)-([^-]+) // Each part consists of characters other than -
([A-Z]{2})-(\d+) // The first part is exactly two capital letters,
// the second consists of digits

Use:
String[] result = yourString.split("-");
if (result.length != 2)
throw new IllegalArgumentException("String not in correct format");
This will split your string into two parts. The first element in the array will be the part containing the stuff before the -, and the second element in the array will contain the part of your string after the -.
If the array length is not 2, then the string was not in the format: string-string.
Check out the split() method in the String class.

This:
String[] out = string.split("-");
should do the thing you want. The string class has many method to operate with a string.

// This leaves the regexes issue out of question
// But we must remember that each character in the Delimiter String is treated
// like a single delimiter
public static String[] SplitUsingTokenizer(String subject, String delimiters) {
StringTokenizer strTkn = new StringTokenizer(subject, delimiters);
ArrayList<String> arrLis = new ArrayList<String>(subject.length());
while(strTkn.hasMoreTokens())
arrLis.add(strTkn.nextToken());
return arrLis.toArray(new String[0]);
}

With Java 8:
List<String> stringList = Pattern.compile("-")
.splitAsStream("004-034556")
.collect(Collectors.toList());
stringList.forEach(s -> System.out.println(s));

Use org.apache.commons.lang.StringUtils' split method which can split strings based on the character or string you want to split.
Method signature:
public static String[] split(String str, char separatorChar);
In your case, you want to split a string when there is a "-".
You can simply do as follows:
String str = "004-034556";
String split[] = StringUtils.split(str,"-");
Output:
004
034556
Assume that if - does not exists in your string, it returns the given string, and you will not get any exception.

The requirements left room for interpretation. I recommend writing a method,
public final static String[] mySplit(final String s)
which encapsulate this function. Of course you can use String.split(..) as mentioned in the other answers for the implementation.
You should write some unit-tests for input strings and the desired results and behaviour.
Good test candidates should include:
- "0022-3333"
- "-"
- "5555-"
- "-333"
- "3344-"
- "--"
- ""
- "553535"
- "333-333-33"
- "222--222"
- "222--"
- "--4555"
With defining the according test results, you can specify the behaviour.
For example, if "-333" should return in [,333] or if it is an error.
Can "333-333-33" be separated in [333,333-33] or [333-333,33] or is it an error? And so on.

To summarize: there are at least five ways to split a string in Java:
String.split():
String[] parts ="10,20".split(",");
Pattern.compile(regexp).splitAsStream(input):
List<String> strings = Pattern.compile("\\|")
.splitAsStream("010|020202")
.collect(Collectors.toList());
StringTokenizer (legacy class):
StringTokenizer strings = new StringTokenizer("Welcome to EXPLAINJAVA.COM!", ".");
while(strings.hasMoreTokens()){
String substring = strings.nextToken();
System.out.println(substring);
}
Google Guava Splitter:
Iterable<String> result = Splitter.on(",").split("1,2,3,4");
Apache Commons StringUtils:
String[] strings = StringUtils.split("1,2,3,4", ",");
So you can choose the best option for you depending on what you need, e.g. return type (array, list, or iterable).
Here is a big overview of these methods and the most common examples (how to split by dot, slash, question mark, etc.)

You can try like this also
String concatenated_String="hi^Hello";
String split_string_array[]=concatenated_String.split("\\^");

Assuming, that
you don't really need regular expressions for your split
you happen to already use apache commons lang in your app
The easiest way is to use StringUtils#split(java.lang.String, char). That's more convenient than the one provided by Java out of the box if you don't need regular expressions. Like its manual says, it works like this:
A null input String returns null.
StringUtils.split(null, *) = null
StringUtils.split("", *) = []
StringUtils.split("a.b.c", '.') = ["a", "b", "c"]
StringUtils.split("a..b.c", '.') = ["a", "b", "c"]
StringUtils.split("a:b:c", '.') = ["a:b:c"]
StringUtils.split("a b c", ' ') = ["a", "b", "c"]
I would recommend using commong-lang, since usually it contains a lot of stuff that's usable. However, if you don't need it for anything else than doing a split, then implementing yourself or escaping the regex is a better option.

For simple use cases String.split() should do the job. If you use guava, there is also a Splitter class which allows chaining of different string operations and supports CharMatcher:
Splitter.on('-')
.trimResults()
.omitEmptyStrings()
.split(string);

The fastest way, which also consumes the least resource could be:
String s = "abc-def";
int p = s.indexOf('-');
if (p >= 0) {
String left = s.substring(0, p);
String right = s.substring(p + 1);
} else {
// s does not contain '-'
}

String Split with multiple characters using Regex
public class StringSplitTest {
public static void main(String args[]) {
String s = " ;String; String; String; String, String; String;;String;String; String; String; ;String;String;String;String";
//String[] strs = s.split("[,\\s\\;]");
String[] strs = s.split("[,\\;]");
System.out.println("Substrings length:"+strs.length);
for (int i=0; i < strs.length; i++) {
System.out.println("Str["+i+"]:"+strs[i]);
}
}
}
Output:
Substrings length:17
Str[0]:
Str[1]:String
Str[2]: String
Str[3]: String
Str[4]: String
Str[5]: String
Str[6]: String
Str[7]:
Str[8]:String
Str[9]:String
Str[10]: String
Str[11]: String
Str[12]:
Str[13]:String
Str[14]:String
Str[15]:String
Str[16]:String
But do not expect the same output across all JDK versions. I have seen one bug which exists in some JDK versions where the first null string has been ignored. This bug is not present in the latest JDK version, but it exists in some versions between JDK 1.7 late versions and 1.8 early versions.

There are only two methods you really need to consider.
Use String.split for a one-character delimiter or you don't care about performance
If performance is not an issue, or if the delimiter is a single character that is not a regular expression special character (i.e., not one of .$|()[{^?*+\) then you can use String.split.
String[] results = input.split(",");
The split method has an optimization to avoid using a regular expression if the delimeter is a single character and not in the above list. Otherwise, it has to compile a regular expression, and this is not ideal.
Use Pattern.split and precompile the pattern if using a complex delimiter and you care about performance.
If performance is an issue, and your delimiter is not one of the above, you should pre-compile a regular expression pattern which you can then reuse.
// Save this somewhere
Pattern pattern = Pattern.compile("[,;:]");
/// ... later
String[] results = pattern.split(input);
This last option still creates a new Matcher object. You can also cache this object and reset it for each input for maximum performance, but that is somewhat more complicated and not thread-safe.

You can split a string by a line break by using the following statement:
String textStr[] = yourString.split("\\r?\\n");
You can split a string by a hyphen/character by using the following statement:
String textStr[] = yourString.split("-");

public class SplitTest {
public static String[] split(String text, String delimiter) {
java.util.List<String> parts = new java.util.ArrayList<String>();
text += delimiter;
for (int i = text.indexOf(delimiter), j=0; i != -1;) {
String temp = text.substring(j,i);
if(temp.trim().length() != 0) {
parts.add(temp);
}
j = i + delimiter.length();
i = text.indexOf(delimiter,j);
}
return parts.toArray(new String[0]);
}
public static void main(String[] args) {
String str = "004-034556";
String delimiter = "-";
String result[] = split(str, delimiter);
for(String s:result)
System.out.println(s);
}
}

Please don't use StringTokenizer class as it is a legacy class that is retained for compatibility reasons, and its use is discouraged in new code. And we can make use of the split method as suggested by others as well.
String[] sampleTokens = "004-034556".split("-");
System.out.println(Arrays.toString(sampleTokens));
And as expected it will print:
[004, 034556]
In this answer I also want to point out one change that has taken place for split method in Java 8. The String#split() method makes use of Pattern.split, and now it will remove empty strings at the start of the result array. Notice this change in documentation for Java 8:
When there is a positive-width match at the beginning of the input
sequence then an empty leading substring is included at the beginning
of the resulting array. A zero-width match at the beginning however
never produces such empty leading substring.
It means for the following example:
String[] sampleTokensAgain = "004".split("");
System.out.println(Arrays.toString(sampleTokensAgain));
we will get three strings: [0, 0, 4] and not four as was the case in Java 7 and before. Also check this similar question.

One way to do this is to run through the String in a for-each loop and use the required split character.
public class StringSplitTest {
public static void main(String[] arg){
String str = "004-034556";
String split[] = str.split("-");
System.out.println("The split parts of the String are");
for(String s:split)
System.out.println(s);
}
}
Output:
The split parts of the String are:
004
034556

import java.io.*;
public class BreakString {
public static void main(String args[]) {
String string = "004-034556-1234-2341";
String[] parts = string.split("-");
for(int i=0;i<parts.length;i++) ｛
System.out.println(parts[i]);
}
}
}

You can use Split():
import java.io.*;
public class Splitting
{
public static void main(String args[])
{
String Str = new String("004-034556");
String[] SplittoArray = Str.split("-");
String string1 = SplittoArray[0];
String string2 = SplittoArray[1];
}
}
Else, you can use StringTokenizer:
import java.util.*;
public class Splitting
{
public static void main(String[] args)
{
StringTokenizer Str = new StringTokenizer("004-034556");
String string1 = Str.nextToken("-");
String string2 = Str.nextToken("-");
}
}

Here are two ways two achieve it.
WAY 1: As you have to split two numbers by a special character you can use regex
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TrialClass
{
public static void main(String[] args)
{
Pattern p = Pattern.compile("[0-9]+");
Matcher m = p.matcher("004-034556");
while(m.find())
{
System.out.println(m.group());
}
}
}
WAY 2: Using the string split method
public class TrialClass
{
public static void main(String[] args)
{
String temp = "004-034556";
String [] arrString = temp.split("-");
for(String splitString:arrString)
{
System.out.println(splitString);
}
}
}

You can simply use StringTokenizer to split a string in two or more parts whether there are any type of delimiters:
StringTokenizer st = new StringTokenizer("004-034556", "-");
while(st.hasMoreTokens())
{
System.out.println(st.nextToken());
}

Check out the split() method in the String class on javadoc.
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
String data = "004-034556-1212-232-232";
int cnt = 1;
for (String item : data.split("-")) {
System.out.println("string "+cnt+" = "+item);
cnt++;
}
Here many examples for split string but I little code optimized.

String str="004-034556"
String[] sTemp=str.split("-");// '-' is a delimiter
string1=004 // sTemp[0];
string2=034556//sTemp[1];

I just wanted to write an algorithm instead of using Java built-in functions:
public static List<String> split(String str, char c){
List<String> list = new ArrayList<>();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++){
if(str.charAt(i) != c){
sb.append(str.charAt(i));
}
else{
if(sb.length() > 0){
list.add(sb.toString());
sb = new StringBuilder();
}
}
}
if(sb.length() >0){
list.add(sb.toString());
}
return list;
}

You can use the method split:
public class Demo {
public static void main(String args[]) {
String str = "004-034556";
if ((str.contains("-"))) {
String[] temp = str.split("-");
for (String part:temp) {
System.out.println(part);
}
}
else {
System.out.println(str + " does not contain \"-\".");
}
}
}

To split a string, uses String.split(regex). Review the following examples:
String data = "004-034556";
String[] output = data.split("-");
System.out.println(output[0]);
System.out.println(output[1]);
Output
004
034556
Note:
This split (regex) takes a regex as an argument. Remember to escape the regex special characters, like period/dot.

String s = "TnGeneral|DOMESTIC";
String a[]=s.split("\\|");
System.out.println(a.toString());
System.out.println(a[0]);
System.out.println(a[1]);
Output:
TnGeneral
DOMESTIC

String s="004-034556";
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)=='-')
{
System.out.println(s.substring(0,i));
System.out.println(s.substring(i+1));
}
}
As mentioned by everyone, split() is the best option which may be used in your case. An alternative method can be using substring().

How to clean a file, replacing unwanted seperators, operators, string literals

I'm working on a concordance problem where I must: "Clean the file. For this, remove all string literals (anything enclosed
in double quotes, the second of which is not preceded by an odd number
of backslashes), remove all // comments, remove all separator characters
(look these up), and operators (look these up). Do not worry about ".class literals" (we will assume they will not appear in the input file)."
I think I know how the replaceAll() method works, but I don't know what's going to be in the file. For starters, how would I go about removing all string literals? Is there a way to replace everything within two double quotes? I.E. String someString = "I want to remove this from a file plz help me, thx";
I've currently put each line of text within an ArrayList of Strings.
Here's what I've got: http://pastebin.com/N84QdLqz

I think I've come up with a solution for your string literal regex. Something like:
inputLine.replaceAll("\"([^\\\\\"]*(\\\\\")*)*([\\\\]{2})*(\\\\\")*[^\"]*\"");
should do the trick. The regex is actually significantly more readable if you print it out to the console after Java has had a chance to escape all of the characters. So if you call System.out.println() with that String, you'll get:
"([^\\"]*(\\")*)*([\\]{2})*(\\")*[^"]*"
I'll break down the original regex to explain. First there's:
"\"([^\\\\\"]*(\\\\\")*)*
This says to match a quote character (") followed by 0 or more patterns of characters that are neither backslashes (\) nor quote characters (") which are followed by 0 or more escaped quotes (\"). As you can see, since \ is typically used as an escape character in Java, any regexes using them become pretty verbose.
([\\\\]{2})*
This says to next match 0 or more sets of 2 (i.e. even-numbered amounts) of backslashes.
(\\\\\")*
This says to match a single backslash followed by a quote character, and to find 0 or more of those together.
[^\"]*\"
This says to match anything that is not a quote character, 0 or more times, followed by a quote character.
I tested my regex with an example similar to what you were asking for:
string literals (anything enclosed in double quotes, the second of which is not preceded by an odd number of backslashes)
Emphasis mine. So by this statement, if the first quote in a literal has a backslash in front of it, it doesn't matter.
String s = "This is "a test\" + "So is this"
Applying the regex with replaceAll and a replacement of \"\", you'll get:
String s = ""a test\""So is this"
which should be correct. You can completely remove the matching literal's quotes, if you want, by calling replaceAll with a replacement of "":
String s = a test\So is this"
Alternately, using this regex on something much less contrived to cause headaches:
String s = "This is \"a test\\" + "So is this"
will return:
String s = +

Yo can do something like this:
private static final String REGEX = "(\"[\\w|\\s]*\")";
private static Pattern P;
private static Matcher M;
public static void main(String args[]){
P = Pattern.compile(REGEX);
//.... your code here ....
}
public static ArrayList<String> readStringsFromFile(String fileName) throws FileNotFoundException
{
Scanner scanner = null;
scanner = new Scanner(new File(fileName));
ArrayList<String> list = new ArrayList<>();
String str = new String();
try
{
while(scanner.hasNext())
{
str = scanner.nextLine();
str = cleanLine(str);//clean the line after read
list.add(str);
}
}
catch (InputMismatchException ex)
{
}
return list;
}
public static String cleanLine(String line) {
int index;
//remove comment lines
index = line.indexOf("//");
if (index != -1) {
line = line.substring(0, index);
}
//remove everything within two double quotes
M = P.matcher(line);
String tmp = "";
while(M.find()) {
tmp = line.substring(0,M.start());
tmp += line.substring(M.end());
line = tmp;
M = P.matcher(line);
}
return line;
}

java regex replaceAll with negated groups

I'm trying to use the String.replaceAll() method with regex to only keep letter characters and ['-_]. I'm trying to do this by replacing every character that is neither a letter nor one of the characters above by an empty string.
So far I have tried something like this (in different variations) which correctly keeps letters but replaces the special characters I want to keep:
current = current.replaceAll("(?=\\P{L})(?=[^\\'-_])", "");

Make it simplier :
current = current.replaceAll("[^a-zA-Z'_-]", "");
Explanation :
Match any char not in a to z, A to Z, ', _, - and replaceAll() method will replace any matched char with nothing.
Tested input : "a_zE'R-z4r#m"
Output : a_zE'R-zrm

You don't need lookahead, just use negated regex:
current = current.replaceAll("[^\\p{L}'_-]+", "");
[^\\p{L}'_-] will match anything that is not a letter (unicode) or single quote or underscore or hyphen.

Your regex is too complicated. Just specify the characters you want to keep, and use ^ to negate, so [^a-z'_-] means "anything but these".
public class Replacer {
public static void main(String[] args) {
System.out.println("with 1234 &*()) -/.,>>?chars".replaceAll("[^\\w'_-]", ""));
}
}

You can try this:
String str = "Se#rbi323a`and_Eur$ope#-t42he-[A%merica]";
str = str.replaceAll("[\\d+\\p{Punct}&&[^-'_\\[\\]]]+", "");
System.out.println("str = " + str);
And it is the result:
str = Serbia'and_Europe-the-[America]

How to get alphabets only from given albha-numberic word in java?

sorry for this if this is a silly question.but i need to know about this.
If i have a word like alphabets,numeric and special charters. I need to extract alphabets only.No need for numeric and special characters.I need to know is there default function is there in Java to split characters only?
eg.String word="te123##st";
I need test only.

This solution works with accentued/non-ascii caracters :
"te123##st\néàø_".replaceAll("[\\p{Digit}\\p{Punct}\\p{Space}]", "");

try this word.replaceAll("[^a-zA-Z]", "");

This will remove all non alphanumeric characters, but it will still remove accented characters.
String word = "te123##st";
word = word.replaceAll("[^\\p{Alpha}]", "");
// or word = word.replaceAll("[\\P{Alpha}]", "");
See apidoc reference.

try
word = word.replaceAll("\\P{Alpha}", "");

String word = "te123##st";
word = word.replaceAll("[\\W\\d._]", "");

try this:
word = word.replaceAll("[\\d##_]", "");

- I won't make this complicated using Regex, but will use inbuilt Java functionalities to answer this.
- First use subString() method to get the "abcd" part of the String, then use toCharArray() method to break the String into char elements, then use Character class's isDigit() method to know whether its a digit or not.
public class T1 {
public static void main(String[] args){
String s = "te123##st";
String str = s.substring(0,4);
System.out.println(str);
String tempStr = new String();
char[] cArr = str.toCharArray();
for(char a :cArr){
if(Character.isAlphabetic(a)){
System.out.println(a+" is a alphabet");
tempStr = tempStr + a;
}else{
System.out.println(a+" is not a alphabet");
}
}
System.out.println("The extracted String is: "+tempStr);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to remove invalid characters from a string? - java

I have no idea how to remove invalid characters from a string in Java. I'm trying to remove all the characters that are not numbers, letters, or ( ) [ ] . How can I do this? Thanks

Using Guava, and almost certainly more efficient (and more readable) than regexes: CharMatcher desired = CharMatcher.JAVA_DIGIT .or(CharMatcher.JAVA_LETTER) .or(CharMatcher.anyOf("()[]")) .precomputed(); // optional, may improve performance, YMMV return desired.retainFrom(string);

Try this: String s = "123abc&^%[]()"; s = s.replaceAll("[^A-Za-z0-9()\\[\\]]", ""); System.out.println(s); The above will remove characters "&^%" in the sample string, leaving in s only "123abc[]()".

public static void main(String[] args) { String c = "hjdg$h&jk8^i0ssh6+/?:().,+-#"; System.out.println(c); Pattern pt = Pattern.compile("[^a-zA-Z0-9/?:().,'+/-]"); Matcher match = pt.matcher(c); if (!match.matches()) { c = c.replaceAll(pt.pattern(), ""); } System.out.println(c); }

Use this code: String s = "Test[]" s = s.replaceAll("["); s = s.replaceAll("]");

myString.replaceAll("[^\\w\\[\\]\\(\\)]", ""); replaceAll method takes a regex as first parameter and replaces all matches in string. This regex matches all characters which are not digit, letter or underscore (\\w) and braces you need (\\[\\]\\(\\)])

Related

How to remove leading 0 in the time timestamp 02:25PM using java? [duplicate]

How to split a string in JAVA with two different seperators? [duplicate]

How to clean a file, replacing unwanted seperators, operators, string literals

java regex replaceAll with negated groups

How to get alphabets only from given albha-numberic word in java?

Categories

Resources