char[] to String sequence mismatching in Java for Unicode characters

char[] to String sequence mismatching in Java for Unicode characters - java

I have a method like below (please ignore the code optimization issue.) This method replaces the Unicode character (Bengali characters)
static String swap(String temp, char c)
{
Integer length=temp.length();
char[] charArray = temp.toCharArray();
for(int u=0;u<length;u++)
{
if(charArray[u]==c)
{
char g=charArray[u];
charArray[u]=charArray[u-1];
charArray[u-1]=g;
}
}
String string2 = new String(charArray);
return string2;
}
while debugging, i got the values of charArray like the below image:
please note that the characters are in a sequenced format what I want. But after the execution of the statement, the value stored in String variable is mismatched. like below:
I want to display the string as "রেরেরে" but it is displaying "েরেরের" what i not want. Please tell me what I am doing wrong.

Note - I don't know Bengali, but I know a bit (or a lot, depending on whom you ask) about Unicode and how Java supports it. The answer assumes knowledge of the latter and not the former.
Going by the Unicode 6.0 Bengali chart, রে is a combination of the dependent vowel sign ে (0x09C7) and the consonant র (0x09B0) and is represented as a sequence of two characters in the character array.
If you are getting the dependent vowel sign alone, in the resulting character sequence (and hence the string), then your optimization is likely to be kooky, as it appears to assume that Bengali characters in Unicode can be represented as a single Unicode codepoint or a single char variable in Java; this would result in the scenario where a consonant would be replaced by another consonant, but the dependent vowel preceding the consonant would never be replaced.
I think a correct optimization must therefore consider the presence of dependent vowels, and compare the following consonant in addition to the vowel , i.e. it must compare two characters in the character array, instead of comparing individual characters. This might also imply that your method signature must be changed to allow for a char[] to be passed, instead of a single char, so that Bengali characters can be replaced with the intended Bengali character, instead of replacing a Unicode codepoint with another, which is what is being done currently.
The notes in other answers on the ArrayIndexOutofBoundsException is valid. The following example that uses your character replacement algorithm demonstrates that not only is your algorithm incorrect, but it is quite possible for the exception to be thrown:
class CodepointReplacer
{
public static void main(String[] args)
{
String str1 = "রেরেরে";
/*
* The following is a linguistically invalid sequence,
* but Java does not concern itself with linguistical correctness
* if the String or char sequence has been constructed incorrectly.
*/
String str2 = "েরেরের";
/*
* replacement character র for our strings
* It is not রে as one would anticipate.
*/
char c = str1.charAt(1);
optimizeKookily(str1, c);
optimizeKookily(str2, c);
}
private static void optimizeKookily(String temp, char c)
{
Integer length = temp.length();
char[] charArray = temp.toCharArray();
for (int u = 0; u < length; u++)
{
if (charArray[u] == c)
{
char g = charArray[u];
charArray[u] = charArray[u - 1]; //throws exception on second invocation of this method.
charArray[u - 1] = g;
}
}
}
}
A better character replacement strategy would therefore be to use the String.replace (the CharSequence variant) or String.replaceAll functions, assuming that you would know how to use these with Bengali characters.

problem is in
for(int u=0;u<length;u++)
{
if(charArray[u]==c)
{
char g=charArray[u];
charArray[u]=charArray[u-1];
charArray[u-1]=g;
}
}
See when u=0 what is the value of charArray[u-1] that is the index -1.Modify your for loop or just put the condition where u=0.

Your code will cause an IndexOutOfBound Exception.
When u=0, charArray[u-1]=-1.

Related

Empty Strings within a non empty String [duplicate]

This question already has answers here:
Replace with empty string replaces newChar around all the characters in original string
(4 answers)
Closed 6 years ago.
I'm confused with a code
public class StringReplaceWithEmptyString
{
public static void main(String[] args)
{
String s1 = "asdfgh";
System.out.println(s1);
s1 = s1.replace("", "1");
System.out.println(s1);
}
}
And the output is:
asdfgh
1a1s1d1f1g1h1
So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').
Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.
So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,
Invalid character constant
The same process gives a compiler error when I tried in char[]
char[] c = {'','a','','s'}; // CTE
So I'm confused about three things.
How an empty String is represented by char[] ?
Why I'm getting that output for the above code?
How the String s1 is represented in char[] when it is initialized first time?
Sorry if I'm wrong at any part of my question.

Just adding some more explanation to Tim Biegeleisen answer.
As of Java 8, The code of replace method in java.lang.String class is
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.
So, behind the scene your code is executed as following
Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));
The the output becomes
1a1s1d1f1g1h1

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.
^|a|s|d|f|g|h|$
^ this and every pipe matches to empty string ""
The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.
Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/
A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

This is because it does a regex match of the pattern/replacement you pass to the replace().
public String replace(CharSequence target, CharSequence replacement) {
return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values
to be replaced
replacement The replacement sequence of char values
Returns: The resulting string
Throws: NullPointerException if target
or replacement is null.
Since:
1.5
Please read more at the link below ... (Also browse through the source code).
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29
A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

Java Unicode replacement char with hex code

I need to replace all special unicode chars in a string with an escape char "\" and its hex value.
For example, this string:
String test="This is a string with the special è unicode char";
Shuld be replaced with:
"This is a string with the special \E8 unicode char";
where E8 is the hex value of unicode value of char "è".

I have two problem:
1) How to find "special chars", may I check for each char value if it is >127?
That depends what your definition of "special" character is. Testing for characters greater than 127 is testing for non-ASCII characters. It is up to you to decide if that is what you want.
2) And how to get hex value as string.
The Integer.toHexString method can be used for that .
3) May I use a regex to find it or a loop for every char of the string?
A loop is simpler.

I'm trying this solution with a regex replacement (I don't know if it is better to loop for each char in string or to use the regex...)
String test="This is a string with the special è unicode char";
Pattern pat=Pattern.compile("([^\\x20-\\x7E])");
int offset=0;
Matcher m=pat.matcher(test);
while(!m.hitEnd()) {
if (m.find(offset)) {
out.append(test.substring(offset,m.start()));
for(int i=0;i<m.group(1).length();i++){
out.append(ESCAPE_CHAR);
out.append(Integer.toHexString(m.group(1).charAt(i)).toUpperCase());
}
offset=m.end();
}
}
return out.toString();

If your unicode characters are all in the range \u0080 - \u00FF you might use this solution.
public class Convert {
public static void main(String[] args) {
String in = "This is a string with the special è unicode char";
StringBuilder out = new StringBuilder(in.length());
for (int i = 0; i < in.length(); i++) {
char charAt = in.charAt(i);
if (charAt > 127) {
// if there is a high number (several e.g. 100.000) of conversions
// this lead in less objects to be garbadge collected
out.append('\\').append(Integer.toHexString(charAt).toUpperCase());
// out.append(String.format("\\%X", (int) charAt));
} else {
out.append(charAt);
}
}
System.out.println(out);
}
}

Correct way to trim a string in Java

In Java, I am doing this to trim a string:
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.trim();
System.out.println("after->>"+input+"<<-");
Output is:
before->> some Thing <<-
after->>some Thing<<-
Works. But I wonder if by assigning a variable to itself, I am doing the right thing. I don't want to waste resources by creating another variable and assigning the trimmed value to it. I would like to perform the trim in-place.
So am I doing this right?

You are doing it right. From the documentation:
Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared.
Also from the documentation:
trim
public String trim()
Returns a copy of the string, with leading and trailing whitespace
omitted. If this String object represents an empty character sequence,
or the first and last characters of character sequence represented by
this String object both have codes greater than '\u0020' (the space
character), then a reference to this String object is returned.
Otherwise, if there is no character with a code greater than '\u0020'
in the string, then a new String object representing an empty string
is created and returned.
Otherwise, let k be the index of the first character in the string
whose code is greater than '\u0020', and let m be the index of the
last character in the string whose code is greater than '\u0020'. A
new String object is created, representing the substring of this
string that begins with the character at index k and ends with the
character at index m-that is, the result of this.substring(k, m+1).
This method may be used to trim whitespace (as defined above) from the
beginning and end of a string.
Returns:
A copy of this string with leading and trailing white space removed, or this string if it has no leading or trailing white
space.

As strings in Java are immutable objects, there is no way to execute trimming in-place. The only thing you can do to trim the string is create new trimmed version of your string and return it (and this is what the trim() method does).

In theory you are not assigning a variable to itself. You are assigning the returned value of method trim() to your variable input.
In practice trim() method implementation is optimized so that it is creating (and returning) another variable only when necessary. In other cases (when there is actually no need to trim) it is returning a reference to original string (in this case you are actually assigning a variable to itself).
See http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.trim%28%29
Anyway trim() does not modify original string, so this is the right way to use it.

String::strip…
The old String::trim method has a strange definition of whitespace.
As discussed here, Java 11 adds new strip… methods to the String class. These use a more Unicode-savvy definition of whitespace. See the rules of this definition in the class JavaDoc for Character::isWhitespace.
Example code.
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.strip();
System.out.println("after->>"+input+"<<-");
Or you can strip just the leading or just the trailing whitespace.

The traditional approach is to use the trim method inline...for example:
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
System.out.println("after->>"+input.trim()+"<<-");
If it is a string that should be trimmed for all usages, trim it up front like you have done. Re-using the same memory location like you have done is not a bad idea, if you want to communicate your intent to other developers. When writing in Java, memory managment is not they key issue since the "gift" of Java is that you do not need to manage it.

Yes, but there will still be two objects until the garbage collector removes the original value that input was pointing to. Strings in Java are immutable. Here is a good explanation: Immutability of Strings in Java.

If we have to trim a String without using trim(), split() methods of Java then following source code can be helpful.
static String allTrim(String str)
{
int j = 0;
int count = 0; // Number of extra spaces
int lspaces = 0;// Number of left spaces
char ch[] = str.toCharArray();
int len = str.length();
StringBuffer bchar = new StringBuffer();
if(ch[0] == ' ')
{
while(ch[j] == ' ')
{
lspaces++;
j++;
}
}
for(int i = lspaces; i < len; i++)
{
if(ch[i] != ' ')
{
if(count > 1 || count == 1)
{
bchar.append(' ');
count = 0;
}
bchar.append(ch[i]);
}
else if(ch[i] == ' ')
{
count++;
}
}
return bchar.toString();
}

The java string trim() method eliminates leading and trailing spaces
public class StringTrimExample{
public static void main(String args[]){
String s1=" hello string ";
System.out.println(s1+"javatpoint");//without trim()
System.out.println(s1.trim()+"javatpoint");//with trim()
}}
output
hello string javatpoint
hello stringjavatpoint

How to represent empty char in Java Character class

I want to represent an empty character in Java as "" in String...
Like that char ch = an empty character;
Actually I want to replace a character without leaving space.
I think it might be sufficient to understand what this means: no character not even space.

You may assign '\u0000' (or 0).
For this purpose, use Character.MIN_VALUE.
Character ch = Character.MIN_VALUE;

char means exactly one character. You can't assign zero characters to this type.
That means that there is no char value for which String.replace(char, char) would return a string with a diffrent length.

As Character is a class deriving from Object, you can assign null as "instance":
Character myChar = null;
Problem solved ;)

An empty String is a wrapper on a char[] with no elements. You can have an empty char[]. But you cannot have an "empty" char. Like other primitives, a char has to have a value.
You say you want to "replace a character without leaving a space".
If you are dealing with a char[], then you would create a new char[] with that element removed.
If you are dealing with a String, then you would create a new String (String is immutable) with the character removed.
Here are some samples of how you could remove a char:
public static void main(String[] args) throws Exception {
String s = "abcdefg";
int index = s.indexOf('d');
// delete a char from a char[]
char[] array = s.toCharArray();
char[] tmp = new char[array.length-1];
System.arraycopy(array, 0, tmp, 0, index);
System.arraycopy(array, index+1, tmp, index, tmp.length-index);
System.err.println(new String(tmp));
// delete a char from a String using replace
String s1 = s.replace("d", "");
System.err.println(s1);
// delete a char from a String using StringBuilder
StringBuilder sb = new StringBuilder(s);
sb.deleteCharAt(index);
s1 = sb.toString();
System.err.println(s1);
}

As chars can be represented as Integers (ASCII-Codes), you can simply write:
char c = 0;
The 0 in ASCII-Code is null.

If you want to replace a character in a String without leaving any empty space then you can achieve this by using StringBuilder. String is immutable object in java,you can not modify it.
String str = "Hello";
StringBuilder sb = new StringBuilder(str);
sb.deleteCharAt(1); // to replace e character

I was looking for this. Simply set the char c = 0; and it works perfectly. Try it.
For example, if you are trying to remove duplicate characters from a String , one way would be to convert the string to char array and store in a hashset of characters which would automatically prevent duplicates.
Another way, however, will be to convert the string to a char array, use two for-loops and compare each character with the rest of the string/char array (a Big O on N^2 activity), then for each duplicate found just set that char to 0..
...and use new String(char[]) to convert the resulting char array to string and then sysout to print (this is all java btw). you will observe all chars set to zero are simply not there and all duplicates are gone. long post, but just wanted to give you an example.
so yes set char c = 0; or if for char array, set cArray[i]=0 for that specific duplicate character and you will have removed it.

You can't. "" is the literal for a string, which contains no characters. It does not contain the "empty character" (whatever you mean by that).

In java there is nothing as empty character literal, in other words, '' has no meaning unlike "" which means a empty String literal
The closest you can go about representing empty character literal is through zero length char[], something like:
char[] cArr = {}; // cArr is a zero length array
char[] cArr = new char[0] // this does the same
If you refer to String class its default constructor creates a empty character sequence using new char[0]
Also, using Character.MIN_VALUE is not correct because it is not really empty character rather smallest value of type character.
I also don't like Character c = null; as a solution mainly because jvm will throw NPE if it tries to un-box it. Secondly, null is basically a reference to nothing w.r.t reference type and here we are dealing with primitive type which don't accept null as a possible value.
Assuming that in the string, say str, OP wants to replace all occurrences of a character, say 'x', with empty character '', then try using:
str.replace("x", "");

char ch = Character.MIN_VALUE;
The code above will initialize the variable ch with the minimum value that a char can have (i.e. \u0000).

this is how I do it.
char[] myEmptyCharArray = "".toCharArray();

You can do something like this:
mystring.replace(""+ch, "");

String before = EMPTY_SPACE+TAB+"word"+TAB+EMPTY_SPACE
Where
EMPTY_SPACE = " " (this is String)
TAB = '\t' (this is Character)
String after = before.replaceAll(" ", "").replace('\t', '\0')
means
after = "word"

You can only re-use an existing character. e.g. \0 If you put this in a String, you will have a String with one character in it.
Say you want a char such that when you do
String s =
char ch = ?
String s2 = s + ch; // there is not char which does this.
assert s.equals(s2);
what you have to do instead is
String s =
char ch = MY_NULL_CHAR;
String s2 = ch == MY_NULL_CHAR ? s : s + ch;
assert s.equals(s2);

Use the \b operator (the backspace escape operator) in the second parameter
String test= "Anna Banana";
System.out.println(test); //returns Anna Banana<br><br>
System.out.println(test.replaceAll(" ","\b")); //returns AnnaBanana removing all the spaces in the string

Doing minus operation on string

I have a small problem with the minus operation in java. When the user press the 'backspace' key, I want the char the user typed, to be taken away from the word which exists.
e.g
word = myname
and after one backspace
word = mynam
This is kinda of what I have:
String sentence = "";
char c = evt.getKeyChar();
if(c == '\b') {
sentence = sentence - c;
} else {
sentence = sentence + c;
}
The add operation works. So if I add a letter, it adds to the existing word. However, the minus isn't working. Am I missing something here? Or doing it completely wrong?

Strings don’t have any kind of character subtraction that corresponds to concatenation with the + operator. You need to take a substring from the start of the string to one before the end, instead; that’s the entire string except for the last character. So:
sentence = sentence.substring(0, sentence.length() - 1);

For convenience, Java supports string concatenation with the '+' sign. This is the one binary operator with a class type as an operand. See String concatenation operator in the Java Language Specification.
Java does not support an overload of the '-' operator between a String and a char.
Instead, you can remove a character from a string by adding the substrings before and after.

sentance = sentance.substring(0, sentance.length() - 1);
There is no corresponding operator to + which allows you to delete a character from a String.

You should investigate the StringBuilder class, eg:
StringBuilder sentence = new StringBuilder();
Then you can do something like:
sentence.append(a);
for a new character or
sentence.deleteCharAt(sentence.length() - 1);
Then when you actually want to use the string, use:
String s = sentence.toString();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

char[] to String sequence mismatching in Java for Unicode characters - java

problem is in for(int u=0;u<length;u++) { if(charArray[u]==c) { char g=charArray[u]; charArray[u]=charArray[u-1]; charArray[u-1]=g; } } See when u=0 what is the value of charArray[u-1] that is the index -1.Modify your for loop or just put the condition where u=0.

Your code will cause an IndexOutOfBound Exception. When u=0, charArray[u-1]=-1.

Related

Empty Strings within a non empty String [duplicate]

Java Unicode replacement char with hex code

Correct way to trim a string in Java

How to represent empty char in Java Character class

Doing minus operation on string

Categories

Resources