Regex to remove special characters in java

Regex to remove special characters in java - java

I have a string with a couple of special characters and need to remove only a few (~ and `). I have written the code below, but when I print the splitted strings, getting empty also with values.
String str = "ABC123-xyz`~`XYZ 1.7A";
String[] str1= varinaces.split("[\\~`]");
for(int i=0; i< str1.length ; i++){
System.out.println("str==="+ parts[i] );
}
Output:
str===ABC123-xyz
str===
str===
str===XYZ 1.7A
why empty strings also printing here ?

You’re splitting on one special char... split on 1 or more:
String[] str1= varinaces.split("[~`]+");
Note also that the tilda ~ doesn’t need escaping.

Its because when you use the .split() method it returns a String array of 4 items shown below:
String[4] { "ABC123-xyz", "", "", "XYZ 1.7A" }
And then in your for loop you printing all items of that array. You can use the following to resolve it:
for(int i=0; i< str1.length ; i++){
if(parts[i].compareTo("") > 0) {
System.out.println("str==="+ parts[i] );
}
}

The split method returns the stuff around every match of the regex. Your regex, [~`], matches to a single character that is either "~" or "`".
The parts of the string separated by matches to that regex are determined as follows:
The string "ABC123-xyz" is returned because it is split off the given string at the character: "`".
In between that character and the next match, "~", is the empty string, and so on.
If you want it to match to more, use [~`]+

Related

How to return only first n number of words in a sentence Java

Say i have a simple sentence as below.
For example, this is what have:
A simple sentence consists of only one clause. A compound sentence
consists of two or more independent clauses. A complex sentence has at
least one independent clause plus at least one dependent clause. A set
of words with no independent clause may be an incomplete sentence,
also called a sentence fragment.
I want only first 10 words in the sentence above.
I'm trying to produce the following string:
A simple sentence consists of only one clause. A compound
I tried this:
bigString.split(" " ,10).toString()
But it returns the same bigString wrapped with [] array.
Thanks in advance.

Assume bigString : String equals your text. First thing you want to do is split the string in single words.
String[] words = bigString.split(" ");
How many words do you like to extract?
int n = 10;
Put words together
String newString = "";
for (int i = 0; i < n; i++) { newString = newString + " " + words[i];}
System.out.println(newString);
Hope this is what you needed.
If you want to know more about regular expressions (i.e. to tell java where to split), see here: How to split a string in Java

If you use the split-Method with a limiter (yours is 10) it won't just give you the first 10 parts and stop but give you the first 9 parts and the 10th place of the array contains the rest of the input String. ToString concatenates all Strings from the array resulting in the whole input String. What you can do to achieve what you initially wanted is:
String[] myArray = bigString.split(" " ,11);
myArray[10] = ""; //setting the rest to an empty String
myArray.toString(); //This should give you now what you wanted but surrouned with array so just cut that off iterating the array instead of toString or something.

This will help you
String[] strings = Arrays.stream(bigstring.split(" "))
.limit(10)
.toArray(String[]::new);

Here is exactly what you want:
String[] result = new String[10];
// regex \s matches a whitespace character: [ \t\n\x0B\f\r]
String[] raw = bigString.split("\\s", 11);
// the last entry of raw array is the whole sentence, need to be trimmed.
System.arraycopy(raw, 0, result , 0, 10);
System.out.println(Arrays.toString(result));

how to remove multiple token from string array in java by split along with [ ]

how to remove multiple token from string array in java by split along with [ ]
String Order_Menu_Name= [pohe-7, puri-3];
String [] s2=Order_Menu_Name.split("-|,");
int j = 0;
//out.println("s2.length "+s2.length);
while(j<s2.length){ }
and expected output should be each value separate.
e,g pohe 7 puri 3

Your question is not clear. Assuming that your string contains "pohe-7, puri-3" you can split them using a separator such as "," or "-" or whitespace. See below.
String Order_Menu_Name= "[pohe-7, puri-3]";
To remove "[" and "]" from the above String. you can use Java's replace method as follow:
Order_Menu_Name = Order_Menu_Name.replace("[", "");
Order_Menu_Name = Order_Menu_Name.replace("]", "");
You can replace the above two lines with one using regex expression that matches [....] if you wish to.
After you removed the above characters then you can split your string as follow.
String[] chunks = Order_Menu_Name.split(",");
i = 0;
while(chunks.length) {
System.out.println(chunks[i]);
i++;
}
You can pass one or two params to the Java split() method, one being the regex expression that defines the pattern to be found and the second argument is limit, specifying how many chunks to return, see below:
public String[] split(String regex, int limit)
or
public String[] split(String regex)
For example
String Str = new String("Welcome-to-Stackoverflow.com");
for (String retval: Str.split("-", 3)){
System.out.println(retval);
}
When splitting the above Str using seperator "-" you should get 3 chunks of strings as follow:
Welcome
to
Stackoverflow.com
If you pass the split function a limit of 2 instead of three then you get the following:
Welcome
to-Stackoverflow.com
Notice above "to-Stckoverflow.com" is returned as is because we limited the chunks to 2.

String.replace() not replacing all occurrences

I have a very long string which looks similar to this.
355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,....
When I tried using the following code to remove the number 382 from the string.
String str = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,...."
str = str.replace(",382,", ",");
But it seems that not all occurrences are being replaced. The string which originally had above 3000 occurrences still was left with about 630 occurrences after replacing.
Is the capability of String.replace() limited? If so, is there a possible way of achieving what I need?

You need to replace the trailing comma as well (if one exists, which it won't if last in the list):
str = str.replaceAll("\\b382,?", "");
Note \b word boundary to prevent matching "-,1382,-".
The above will convert:
382,111,382,1382,222,382
to:
111,1382,222

I think the issue is your first argument to replace(), in particular the comma (,) before and after 382. If you have "382,382,383", you will only match the inner ",382," and leave the initial one behind. Try:
str.replace("382,", "");
Although this will fail to match "382" at the very end as it does not have a comma after it.
A full solution might entail two method calls thus:
str = str.replace("382", ""); // Remove all instances of 382
str.replaceAll(",,+", ","); // Compress all duplicates, triplicates, etc. of commas
This combines the two approaches:
str.replaceAll("382,?", ""); // Remove 382 and an optional comma after it.
Note: both of the last two approaches leave a trailing comma if 382 is at the end.

try this
str = str.replaceAll(",382,", ",");

Firstly, remove the preceding comma in your matching string. Then, remove duplicated commas by replacing commas with a single comma using java regular expression.
String input = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399";
String result = input.replace("382,", ","); // remove the preceding comma
String result2 = result.replaceAll("[,]+", ","); // replace duplicate commas
System.out.println(result2);

As dave already said, the problem is that your pattern overlaps. In the string "...,382,382,..." there are two occurrences of ",382,":
"...,382,382,..."
----- first occurrence
----- second occurrence
These two occurrences overlap at the comma, and thus Java can only replace one of them. When finding occurrences, it does not see yet what you replace the pattern with, and thus it doesn't see that new occurrence of ",382," is generated when replacing the first occurrence is replaced by the comma.
If your data is known not to contain numbers with more than 3 digits, then you might do:
str.replace("382,", "");
and then handle occurrences at the end as a special case. But if your data can contain big numbers, then "...,1382,..." will be replaced by "...,1,..." which probably is not what you want.
Here are two solutions that do not have the above problem:
First, simply repeat the replacement until no changes occur anymore:
String oldString = str;
str = str.replace(",382,", ",");
while (!str.equals(oldString)) {
oldString = str;
str = str.replace(",382,", ",");
}
After that, you will have to handle possible occurrences at the end of the string.
Second, if you have Java 8, you can do a little more work yourself and use Java streams:
str = Arrays.stream(str.split(","))
.filter(s -> !s.equals("382"))
.collect(Collectors.joining(","));
This first splits the string at ",", then filters out all strings which are equal to "382", and then concatenates the remaining strings again with "," in between.
(Both code snippets are untested.)

Traditional way:
String str = ",abc,null,null,0,0,7,8,9,10,11,12,13,14";
String newStr = "", word = "";
for (int i=0; i<str.length(); i++) {
if (str.charAt(i) == ',') {
if (word.equals("null") || word.equals("0"))
word = "";
newStr += word+",";
word = "";
} else {
word += str.charAt(i);
if (i == str.length()-1)
newStr += word;
}
}
System.out.println(newStr);
Output:
,abc,,,,,7,8,9,10,11,12,13,14

Java Split String Consecutive Delimiters

I have a need to split a string that is passed in to my app from an external source. This String is delimited with a caret "^" and here is how I split the String into an Array
String[] barcodeFields = contents.split("\\^+");
This works fine except that some of the passed in fields are empty and I need to account for them. I need to insert either "" or "null" or "empty" into any missing field.
And the missing fields have consecutive delimiters. How do I split a Java String into an array and insert a string such as "empty" as placeholders where there are consecutive delimiters?

The answer by mureinik is quite close, but wrong in an important edge case: when the trailing delimiters are in the end. To account for that you have to use:
contents.split("\\^", -1)
E.g. look at the following code:
final String line = "alpha ^beta ^^^";
List<String> fieldsA = Arrays.asList(line.split("\\^"));
List<String> fieldsB = Arrays.asList(line.split("\\^", -1));
System.out.printf("# of fieldsA is: %d\n", fieldsA.size());
System.out.printf("# of fieldsB is: %d\n", fieldsB.size());
The above prints:
# of fieldsA is: 2
# of fieldsB is: 5

String.split leaves an empty string ("") where it encounters consecutive delimiters, as long as you use the right regex. If you want to replace it with "empty", you'd have to do so yourself:
String[] split = barcodeFields.split("\\^");
for (int i = 0; i < split.length; ++i) {
if (split[i].length() == 0) {
split[i] = "empty";
}
}

Using ^+ means one (or more consecutive) carat characters. Remove the plus
String[] barcodeFields = contents.split("\\^");
and it won't eat empty fields. You'll get (your requested) "" for empty fields.

The following results in [blah, , bladiblah, moarblah]:
String test = "blah^^bladiblah^moarblah";
System.out.println(Arrays.toString(test.split("\\^")));
Where the ^^ are replaced by a "", the empty String

How to convert the string into Array in Java

I have string as [arun, joseph, sachin, kavin]. I want to replace this text as ["arun", "joseph", "sachin", "kavin"]. All the values should be in double quotes.
I have tried to do this using replace method. But i could not accomplish. Can anyone help me to resolve this?

Your question is a bit unclear. Do you want to turn a string containing
[arun, joseph, sachin, kavin]
into this string
["arun", "joseph", "sachin", "kavin"]
or do you want to turn it into an actual array containing "arun", "joseph", "sachin" and "kavin"?
Regardless, this is pretty basic string manipulation. Here's what I suggest you try:
Use substring to get rid of the first and last character.
Use split to split the string on ", ".
If you want to add '"' before and after each component in this array, you can do
for (int i = 0; i < array.length; i++)
array[i] = '"' + array[i] + '"';

You could try this,
replace [, ] with an empty string.
Then do splitting according to the comma.
Strings parts[] = string.replaceAll("^\\[|\\]$", "").split("\\s*,\\s*");
^\\[|\\]$ matches the [, ] present at the start and at the end.
replaceAll function then replaces the matched brackets with an empty string.
Then by splitting the resultant string according to
\s* -> zero or more spaces
, -> comma
\s* -> zero or more spaces
will give you the desired output.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex to remove special characters in java - java

You’re splitting on one special char... split on 1 or more: String[] str1= varinaces.split("[~`]+"); Note also that the tilda ~ doesn’t need escaping.

Related

How to return only first n number of words in a sentence Java

how to remove multiple token from string array in java by split along with [ ]

String.replace() not replacing all occurrences

Java Split String Consecutive Delimiters

How to convert the string into Array in Java

Categories

Resources