Removing items from String - java

I am trying to replace all occurrences of a substring from a String.
I want to replace "\t\t\t" with "<3tabs>"
I want to replace "\t\t\t\t\t\t" with "<6tabs>"
I want to replace "\t\t\t\t" with "< >"
I am using
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
But no use, it does not replace anything, then i tried using
s = s.replaceAll("\t\t\t\t", "< >");
s = s.replaceAll("\t\t\t", "<3tabs>");
s = s.replaceAll("\t\t\t\t\t\t", "<6tabs>");
Again, no use, it does not replace anything. after trying these two methods i tried StringBuilder
I was able to replace the items through StringBuilder, My Question is, why am i unable to replace the items directly through String from the above two commands? Is there any method from which i can directly replace items from String?

try in this order
String s = "This\t\t\t\t\t\tis\t\t\texample\t\t\t\t";
s = s.replace("\t\t\t\t\t\t", "<6tabs>");
s = s.replace("\t\t\t\t", "< >");
s = s.replace("\t\t\t", "<3tabs>");
System.out.print(s);
output:
This<6tabs>is<3tabs>example< >

6tabs is never going to find a match as the check before it will have already replaced them with two 3tabs.
You need to start with largest match first.
Strings are immutable so you can't directly modify them, s.replace() returns a new String with the modifications present in it. You then assign that back to s though so it should work fine.
Put things in the correct order and step through it with a debugger to see what is happening.

Take a look at this
Go through your text, divide it into a char[] array, then use a for loop to go through the individual characters.
Don't print them out straight, but print them using a %x tag (or %d if you like decimal numbers).
char[] characters = myString.tocharArray();
for (char c : characters)
{
System.out.printf("%x%n", c);
}
Get an ASCII table and look up all the numbers for the characters, and see whether there are any \n or \f or \r. Do this before or after.
Different operating systems use different line terminating characters; this is the first reference I found from Google with "line terminator Linux Windows." It says Windows uses \r\f and Linux \f. You should find that out from your example. Obviously if you strip \n and leave \r you will still have the text break into separate lines.
You might be more successful if you write a regular expression (see this part of the Java Tutorial, etc) which includes whitespace and line terminators, and use it as a delimiter with the String.split() method, then print the individual tokens in order.

Related

Having problem when comparing two strings

I'm reading a CSV file using Java. Inside the file, each row is in this format:
operation, start, end.
I need to do a different operation for different input. But something weird happened when I'm trying to compare two string.
I used equals to compare two strings. And one of the operation is "add", but the first element I fetched from the document always give me the wrong answer. I know that's an "add" and I printed it out it looks like an "add", but when I'm using operation.equals("add"), it's false. For all rest of Strings it's correct except the first one. Is there anything special about the first row in CSV file?
Here is my code:
while ((line = br.readLine()) != null) {
String[] data = line.split(",");
String operation = data[0];
int start = Integer.parseInt(data[1]);
int end = Integer.parseInt(data[2]);
System.out.println(operation + " " + start + " " + end);
System.out.println(operation.equals("add"));
For example, it printed out
add 1 3
false
add 4 6
true
And I really don't know why. These two add looks exactly the same.
And here is what my csv file look like:
enter image description here
There are (at least) 4 reasons why two string that "look" like they are the same when you display / print them could turn out to be non-equal:
If you compare Strings using == rather than equals(Object), then you will often get the wrong answer. (This is not the problem here ... since you are using the equals method. However, this is a common problem.)
Unexpected leading or trailing whitespace characters on one string. These can be removed using trim().
Other leading, trailing or embedded control characters or Unicode "funky" characters. For example stray Unicode BOM (byte order mark) characters.
Homoglyphs. There are a number of examples where two or more distinct Unicode code points are rendered on the screen using the same or virtually the same glyphs.
Cases 3 and 4 can only be reliably detected by using traceprints or a debugger to examine the lengths and the char values in the two strings.
(Screen shots of the CSV file won't help us to diagnose this! A cut-and-paste of the CSV file might help.)
You should remove the double quotes from the first element and then check with equals method.
Try this:
String operation = operation.substring(1, to.length() - 1);
operation.equals("add")
Hope it works for you.
It looks like your line in image looks fine. I suppose in this case, that you could set wrong document encoding. E.g. when UTF, and you do not put it, then is has special header at the beginning. It could be a reason, why you read first word incorrectly.

Escape character '\' doesn't show in System.out.println() but in return value

In Java, when I replace characters in a String with escaped-characters, the characters show up in the return value, although they were not there according to System.out.println.
String[][][] proCategorization(String[] pros, String[][] preferences) {
String str = "wehnquflkwe,wefwefw,wefwefw,wefwef";
String strReplaced = str.replace(",","\",\""); //replace , with ","
System.out.println(strReplaced);
The console output is: wehnquflkwe","wefwefw","wefwefw","wefwef
String[][][] array3d = new String[1][1][1]; // initialize 3d array
array3d[0][0][0] = strReplaced;
System.out.println(array3d[0][0][0]);
return array3d;
}
The console output is:
wehnquflkwe","wefwefw","wefwefw","wefwef
Now the return value is:
[[["wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"]]]
I don't understand why the \ show up in the return value but not in the System.out.println.
Characters in memory can be represented in different ways.
Your integrated development environment (IDE) has a debugger that chooses to represent a String[][][] with a single element that contains the characters
wehnquflkwe","wefwefw","wefwefw","wefwef
as a java-quoted string
"wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"
this makes a lot of sense, because you can then copy and paste this string into java code without any loss.
On the other hand, your system's console, and the IDE's built-in terminal emulator, will output the characters in their normal representation, that is, without any java string-escape-characters:
wehnquflkwe","wefwefw","wefwefw","wefwef
As an experiment, you may want to check what happens with other "special" characters, such as \t (a tab break) or \b (backspace). This is just the tip of the iceberg - characters in Java generally translate into unicode points, which may or may not be supported by the fonts available in your system or terminal. The IDE's way of representing characters as java-quoted strings allows it to losslessly represent pretty much anything; System.out.println's output is a lot more variable.
System.out.println prints the String exactly as it is stored in memory.
On the other hand, when you stop the application flow using a breakpoint you are able to look up the values.
Most of the IDEs display escape characters with \ to indicate that it's just one String, not String[] in this case, or not to split the String into two lines if it contains \n in the middle.
Just in case, you still have doubts, I suggest printing strReplaced.length(). This should allow you to count characters one by one.
Possible experiments:
String s = "my cute \n two line String";
System.out.println(s + " length is: " + s.length());

Issues with delimiter ("\t | \n") Java

I am having issues using my delimiter in my scanner. I am currently using a scanner to read a text file and put tokens into a string. My tutor told me to use the delimiter (useDelimiter("\t|\n")). However each token that it is grabbing is ending in /r (due to a return in the text file). This is fine for printing purposes, however i need to get the string length. And instead of returning the number of actual characters, it is returning the number of characters including that /r. Is there a better delimiter I can use that will accomplish the same thing (without grabbing the /r)? code is as follows:
studentData.useDelimiter("\t|\n");
while (studentData.hasNext())
{
token = studentData.next();
int tokenLength = token.length();
statCalc(tokenLength);
}
I am well aware that I could simply remove the last character of the string token. However, for many reasons, I just want it to grab the token without the /r. Any and all help would be greatly appreciated.
Try this:
studentData.useDelimiter("\\t|\\R");
The \R pattern matches any linebreak, see documentation.
I guess the remaining \r char is a partially consumed linebreak in Windows environment. With the aforementioned delimiter, the scanner will properly consume the line.
Replace all Carriage and form return from your string.Try this
s = s.replaceAll("\\n", "");
s = s.replaceAll("\\r", "");
Windows-style line ending is usually: \r\n but you are ignoring \r as delimiter. Your regex pattern (\t|\n) can be improved by using:
(\t|\r\n|\r|\n)
However, it looks to me like what you're trying to accomplish is to create a "tokenizer" which breaks a text file into words (since you're also looking for \t) so my guess is that you're better of with:
studentData.useDelimiter("\\s*");
which will take in consideration any white-space.
You can learn more about regular expressions.

Java Splitting a string with multiple delimiters, some of which are 2-character sequences

long-time reader here but first-time poster! I am working on a college project that involves using Java to manipulate transcriptions of traditional music melodies written in the text-based abc notation standard (see here for a quick explainer on the abc standard, if you are interested).
I want to take the body of a whole tune transcription which is represented as a String, and split it into individual bars (i.e. into an array of Strings, one String for each bar). The abc standard has a number of different symbols and combinations of symbols that are used to delimit bars. These symbols are:
|
|]
||
[|
|:
:|
::
My idea was to use a regular expression with the String.split() method to break the tuneBody String below into the arrayOfBars array of Strings. My regex is below, and is intended to try to find any of the above symbols that can be used to delimit a bar in the music.
import java.util.Arrays;
public class TroubleshootRegex
{
//Split the tuneBody into individual bars
public static void main(String[] args)
{
//The musical notes from an abc tune transcription
String tuneBody = "|:G3 GAB|A3 ABd|edd gdd|edB dBA|\nGAG GAB|ABA ABd|edd gdd|BAF G3:|\nB2B d2d|ege dBA|B2B dBG|ABA AGA|\nBAB d^cd|ege dBd|gfg aga|bgg g3:|";
//The body of the tune after being split into individual bars
String[] arrayOfBars;
//This regex is my attempt to look for all the possible bar delimiters defined in the abc standard
String abcBarDelimiters = "[\\|]|\\|\\||\\[\\||\\|:|:\\||::|\\|]";
arrayOfBars = tuneBody.split(abcBarDelimiters);
System.out.println(Arrays.toString(arrayOfBars));
}
}
Unfortunately, when I run the above, I end up with a couple of issues. One of the issues is that I get an empty string at the start of the array, but a bit of research shows me that that's a known issue so I'll figure out a way to work around that. The bigger issue though that I can't seem to figure out on my own is that I end up with a colon included in the first bar of the music, whereas this should be filtered out as part of the initial delimiter when splitting the string if everything worked as intended. i.e. I want the initial "|:" delimiter from tuneBody to be removed during the string splitting. Here's the output:
[, :G3 GAB, A3 ABd, edd gdd, edB dBA,
GAG GAB, ABA ABd, edd gdd, BAF G3,
B2B d2d, ege dBA, B2B dBG, ABA AGA,
BAB d^cd, ege dBd, gfg aga, bgg g3]
I'm assuming that means that I probably have some kind of problem in my regex, but for the life of me I can't seem to figure out what the actual problem is, and I'm starting to go cross-eyed looking at it! It seems that it is matching the single pipe character at the start as a delimiter, rather than matching the character sequence |:
I'd be massively grateful if anyone who actually knows a bit about regexes can tell me why mine doesn't seem to do what I want, or how to get it to see the |: sequence as a whole as a delimiter, rather than a delimiter followed by a colon.
Thanks in advance!
One of the issues is that I get an empty string at the start of the array, but a bit of research shows me that that's a known issue so I'll figure out a way to work around that.
The problem is that your string starts with a delimiter so it will create an empty string as the first element of the split. The same would happen if you have two consecutive delimiters, e.g. ...|::|.... To solve that you could remove the empty strings you don't want, e.g. by using a list instead of an array.
The bigger issue though that I can't seem to figure out on my own is that I end up with a colon included in the first bar of the music, whereas this should be filtered out as part of the initial delimiter when splitting the string if everything worked as intended. i.e. I want the initial "|:" delimiter from tuneBody to be removed during the string splitting.
I'm not entirely sure here (but pretty sure): the problem is that the single pipe is the first option in your regex and thus it matches the pipe in |:. To fix that it should be sufficient to put the single pipe at the end.
You can also simplify your regex since you don't need character classes. Thus this should work:
String abcBarDelimiters = "\\|\\||\\[\\||\\|:|:\\||::|\\|\\]|\\|";
For going more easy on the regex beginners eyes, try the following:
public static void main(String[] args) {
//The musical notes from an abc tune transcription
String tuneBody = "|:G3 GAB|A3 ABd|edd gdd|edB dBA|\nGAG GAB|ABA ABd|edd gdd|BAF G3:|\nB2B d2d|ege dBA|B2B dBG|ABA AGA|\nBAB d^cd|ege dBd|gfg aga|bgg g3:|";
//The body of the tune after being split into individual bars
String re1 = "\\|[\\]\\||:]?"; // |, |], |:
String re2 = "\\[\\|"; // [|
String re3 = ":[\\|:]"; // :|, ::
String abcBarDelimiters = "(" + re1 + "|" + re2 + "|" + re3 + ")";
String[] arrayOfBars = tuneBody.split(abcBarDelimiters);
System.out.println(Arrays.toString(arrayOfBars));
}
... and as Thomas already said, the empty string at the beginning is due to the input starting with a delimiter.

Is it possible to add data to a string after adding "\0" (null)?

I have a string that I am creating, and I need to add multiple "\0" (null) characters to the string. Between each null character, is other text data (Just ASCII alphanumeric characters).
My problem is that in J2SE when you add the first null (\0), java then seems to determine that it's a string terminator, (similar to C++), and ignores all other data being appended. No error is raised, the trailing data is just ignored. I need to force the additional trailing data after a null in the string. I have to do this for a legacy database that I am supporting.
I have tried to encode/decode the string in hoping that something like %00 would fool the interpretation of the string behaviour, but when I re-encode the string, Java sees the null character again, and removes all data after the first null.
Update: Here is the relevant code snippet. Yes, I am trying to use Strings. I intend to try chars, but I still have to save it into the database as a string, so I suspect that I will end up with the same problem.
Some background. I am receiving data via HTTP post that has "\n". I need to remove the newlines and replace them with "\0". The "debug" method is just a simple method that does System.out.println.
String[] arrLines = sValue.split("\n");
for(int k=0;k<arrLines.length;k++) {
if (0<k) {
sNewValue += "\0";
}
sNewValue+= arrLines[k];
debug("New value =" + sNewValue);
}
sNewValue, a String, is committed to the database and needs to be done as a String. What I am observing when i display the current value of sNewValue after each iteration in the console is something like this:
input is value1\nValue2\nValue3
Output in the console is giving me from this code
value1
value1
value1
I am expecting
value1
value1 value2
value1 value2 value3
with non-printable null between value1, value2 and value3 respectively. Note that the value actually getting saved back into the database is also just "value1". So, it's not just a console display problem. The data after \0 is getting ignored.
I strongly suspect this is nothing to do with the text in the string itself - I suspect it's just how it's being displayed. For example, try this:
public class Test {
public static void main(String[] args) {
String first = "first";
String second = "second";
String third = "third";
String text = first + "\0" + second + "\0" + third;
System.out.println(text.length()); // Prints 18
}
}
This prints 18, showing that all the characters are present. However, if you try to display text in a UI label, I wouldn't be surprised to see only first. (The same may be true in fairly weak debuggers.)
Likewise you should be able to use:
char c = text.charAt(7);
And now c should be 'e' which is the second letter of "second".
Basically, I'd expect the core of Java not to care at all about the fact that it contains U+0000. It's just another character as far as Java is concerned. It's only at boundaries with native code (e.g. display) that it's likely to cause a problem.
If this doesn't help, please explain exactly what you've observed - what it is that makes you think the rest of the data isn't being appended.
EDIT: Another diagnostic approach is to print out the Unicode value of each character in the string:
for (int i = 0; i < text.length(); i++) {
System.out.println((int) text.charAt(i));
}
I suggest you use a char[] or List<Char> instead since it sounds like you're not really using a String as such (a real String doesn't normally contain nulls or other unprintable characters).
Same behavior for the StringBuffer class?
Since "\0" makes some trouble, I would recommend to not use it.
I would try to replace some better delimiter with "\0" when actually writing the string to your DB.
This is because \ is an escape character in Java (as in many C-related languages) and you need to escape it using additional \ as follows.
String str="\\0Java language";
System.out.println(str);
and you should be able the display \0Java language on the console.

Categories