Empty Strings within a non empty String [duplicate] - java

This question already has answers here:
Replace with empty string replaces newChar around all the characters in original string
(4 answers)
Closed 6 years ago.
I'm confused with a code
public class StringReplaceWithEmptyString
{
public static void main(String[] args)
{
String s1 = "asdfgh";
System.out.println(s1);
s1 = s1.replace("", "1");
System.out.println(s1);
}
}
And the output is:
asdfgh
1a1s1d1f1g1h1
So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').
Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.
So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,
Invalid character constant
The same process gives a compiler error when I tried in char[]
char[] c = {'','a','','s'}; // CTE
So I'm confused about three things.
How an empty String is represented by char[] ?
Why I'm getting that output for the above code?
How the String s1 is represented in char[] when it is initialized first time?
Sorry if I'm wrong at any part of my question.

Just adding some more explanation to Tim Biegeleisen answer.
As of Java 8, The code of replace method in java.lang.String class is
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.
So, behind the scene your code is executed as following
Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));
The the output becomes
1a1s1d1f1g1h1

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.
^|a|s|d|f|g|h|$
^ this and every pipe matches to empty string ""
The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.
Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/
A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

This is because it does a regex match of the pattern/replacement you pass to the replace().
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values
to be replaced
replacement The replacement sequence of char values
Returns: The resulting string
Throws: NullPointerException if target
or replacement is null.
Since:
1.5
Please read more at the link below ... (Also browse through the source code).
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29
A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

Related

How to match two string using java Regex

String 1= abc/{ID}/plan/{ID}/planID
String 2=abc/1234/plan/456/planID
How can I match these two strings using Java regex so that it returns true? Basically {ID} can contain anything. Java regex should match abc/{anything here}/plan/{anything here}/planID
If your "{anything here}" includes nothing, you can use .*. . matches any letter, and * means that match the string with any length with the letter before, including 0 length. So .* means that "match the string with any length, composed with any letter". If {anything here} should include at least one letter, you can use +, instead of *, which means almost the same, but should match at least one letter.
My suggestion: abc/.+/plan/.+/planID
If {ID} can contain anything I assume it can also be empty.
So this regex should work :
str.matches("^abc.*plan.*planID$");
^abc at the beginning
.* Zero or more of any Character
planID$ at the end
I am just writing a small code, just check it and start making changes as per you requirement. This is working, check for your other test cases, if there is any issue please comment that test case. Specifically I am using regex, because you want to match using java regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class MatchUsingRejex
{
public static void main(String args[])
{
// Create a pattern to be searched
Pattern pattern = Pattern.compile("abc/.+/plan/.+/planID");
// checking, Is pattern match or not
Matcher isMatch = pattern.matcher("abc/1234/plan/456/planID");
if (isMatch.find())
System.out.println("Yes");
else
System.out.println("No");
}
}
If line always starts with 'abc' and ends with 'planid' then following way will work:
String s1 = "abc/{ID}/plan/{ID}/planID";
String s2 = "abc/1234/plan/456/planID";
String pattern = "(?i)abc(?:/\\S+)+planID$";
boolean b1 = s1.matches(pattern);
boolean b2 = s2.matches(pattern);

Replace characters in string without if else

In java is there a way to replace specific special characters with another special characters within entire text without using if else .
Eg:
String s = abcd&c!&%^ .
Replace & with ~
Replace ! with ¬ etc on the above example string.
String has a replace function, so you can do s = s.replace('&','~');
public String replace(char oldChar, char newChar)
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
If the character oldChar does not occur in the character sequence represented by this String object, then a reference to this String object is returned. Otherwise, a new String object is created that represents a character sequence identical to the character sequence represented by this String object, except that every occurrence of oldChar is replaced by an occurrence of newChar.
String.replace​(char oldChar, char newChar);
All things you can do with String: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html
I recommend reading these docs when you're working with anything Java you don't know. Like Arrays, or Lists, and so on.

Remove duplicated characters from String using regex keeping first occurances

I know how to remove duplicated characters from a String and keeping the first occurrences without regex:
String method(String s){
String result = "";
for(char c : s.toCharArray()){
result += result.contains(c+"")
? ""
: c;
}
return result;
}
// Example input: "Type unique chars!"
// Output: "Type uniqchars!"
I know how to remove duplicated characters from a String and keeping the last occurrences with regex:
String method(String s){
return s.replaceAll("(.)(?=.*\\1)", "");
}
// Example input: "Type unique chars!"
// Output: "Typnique chars!"
As for my question: Is it possible, with a regex, to remove duplicated characters from a String, but keep the first occurrences instead of the last?
As for why I'm asking: I came across this codegolf answer using the following function (based on the first example above):
String f(char[]s){String t="";for(char c:s)t+=t.contains(c+"")?"":c;return t;}
and I was wondering if this can be done shorter with a regex and String input. But even if it's longer, I'm just curious in general if it's possible to remove duplicated characters from a String with a regex, while keeping the first occurrences of each character.
It is not the shortest option, and does not only involve a regex, but still an option. You may reverse the string before running the regex you have and then reverse the result back.
public static String g(StringBuilder s){
return new StringBuilder(
s.reverse().toString()
.replaceAll("(?s)(.)(?=.*\\1)", ""))
.reverse().toString();
}
See the online Java demo
Note I suggest adding (?s) (= Pattern.DOTALL inline modifier flag) to the regex so as . could match any symbol including a newline (a . does not match all line breaks by default).

Java - replace string: charAt VS substring

I'm making a simple test, just removing a char from string. It goes like this:
String str = "kitten";
int i = 2;
//substring version - works good
System.out.println(str.replaceFirst(str.substring(i, i+1), ""));
//charAt (tried as regex):
System.out.println(str.replaceFirst("[str.charAt(i)]", ""));
//charAt (tried as char):
System.out.println(str.replaceFirst("str.charAt(i)", ""));
Substring version works good, charAt works good only if i=1. What is wrong here ?
In your second and third snippet, you're replacing not the result of charAt() call, but "charAt(i)" string. See, it is in quotes. Also, charAt() returns char so you have to convert it to String before using.
Try this:
System.out.println(str.replaceFirst("[" + String.valueOf(str.charAt(i)) + "]", ""));
System.out.println(str.replaceFirst(String.valueOf(str.charAt(i)), ""));
System.out.println(str.replaceFirst("str.charAt(i)", ""));
doesn't do what you think it does. It isn't looking for a character at i. It is looking for the first instance that matches the regex pattern "str.charAt(i)". Similar issues exist with your other "replaceFirst" implementation.
That means that "strAchar(i)" matches "str.charAt(i)" but when i happens to equal 2, "i" does not match "str.charAt(i)". The stuff between the double quotes is not interpreted as Java code.
System.out.println(str.replaceFirst("str.charAt(i)", ""));
This line will replace the string "str,charAt(i)" by "" (if it occurs) in the string str.
You need to read more about replace() and charAt() here.
Using the first example, which you say "works good", I'd expect this output:
kiten
ktten
kitten
str.substring(i+1) returns 't' (i+1th or "3rd" character). You then pass this into str.replaceFirst which replaces the first occurrence of 't' with "", effectively erasing it.
What you are doing in the second one is weird: You are invoking replaceFirst with the regex "[str.charAt(i)]" which basically means "replace the first of any of the characters in the square brackets (except the round brackets I think" so you may as well be saying "match any of the characters a,A,c,h,i,r,s,t" (I alphabetised, removed duplicates and '(', ')' and '.'), and the first of these characters that matches "kitten" just so happens to be 'i' so that charachter is removed.
The final example is looking for a complete match on the string "str.charAt(i)" which is of course nowhere to be found in "kitten" you may as well be searching for "dog".
The following code is equivalent to what you have just done:
String str = "kitten";
int i = 2;
// Eliminated redundant regex replacement:
System.out.println(new StringBuffer(str.substring(0, i)).append(str.substring(i+1)));
// Search for any of the characters in "str.substring(i)"
System.out.println(str.replaceFirst("[aAchirst]", ""));
// Search for non-matching string
System.out.println(str.replaceFirst("dog", ""));

Java: Parsing a string based on delimiter

I have to design an interface where it fetches data from machine and then plots it. I have already designed the fetch part and it fetches a string of format A&B#.13409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$
First five A&B#. characters are the identifier. Please note that the fifth character is new line feed i.e. ASCII 0xA.
The function I have written -
public static boolean checkStart(String str,String startStr){
String Initials = str.substring(0,5);
System.out.println("Here is start: " + Initials);
if (startStr.equals(Initials))
return true;
else
return false;
}
shows Here is start: A&B#. which is correct.
Question 1:
Why do we need to take str.substring(0,5) i.e. when I use str.substring(0,4) it shows only - Here is start: A&B# i.e. missing new line feed. Why is New Line feed making this difference.
Further to extract remaing string I have to use s.substring(5,s.length()) instead of s.substring(6,s.length())
i.e.
s.substring(6,s.length()) produces 3409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$ i.e missing the first char after the identifier A&B#.
Question 2:
My parsing function is:
public static String[] StringParser(String str,String del){
String[] sParsed = str.split(del);
for (int i=0; i<sParsed.length; i++) {
System.out.println(sParsed[i]);
}
return sParsed;
}
It parses correctly for String String s = "A&B#.13409/13400/13400/13386/13418/13427/13406/13383/13406/13412/13419/00000/00000/"; and calling the function as String[] tokens = StringParser(rightChannelString,"/");
But for String such as String s = "A&B#.13409$13400$13400$13386$13418$13427$13406$13383$13406$13412$13419$00000$00000$" , the call String[] tokens = StringParser(rightChannelString,"$"); does not parse the string at all.
I am not able to figure out why this behaviour. Can any one please let me know the solution?
Thanks
Regarding question 1, the java API says that the substring method takes 2 parameters:
beginIndex the begin index, inclusive.
endIndex the end index, exclusive.
So in your example
String: A&B#.134
Index: 01234567
substring(0,4) = indexes 0 to 3 so A&B#, that's why you have to put 5 as the second parameter to recover your line delimiter.
Regarding question 2, I guess that the split method takes a regexp in parameter and $ is a special character. To match the dollar sign I guess you have to escape it with the \ character (as \ is a special char in strings so you must also escape it).
String[] tokens = StringParser(rightChannelString,"\\$");
Q1: review the description of substring in the documentation:
Returns a new string that is a substring of this string.
The substring begins at the specified beginIndex and extends to the
character at index endIndex - 1. Thus the length of the substring
is endIndex-beginIndex.
Q2: the split method takes a regular expression for the separator. $ is a special character for regular expressions, it matches the end of the line.

Categories