Given a string in Java, just take the first X letters - java

Is there something like a C# Substring for Java? I'm creating a mobile application for Blackberry device and due to screen constraints I can only afford to show 13 letters plus three dots for an ellipsis.
Any suggestion on how to accomplish this?
I need bare bones Java and not some fancy trick because I doubt a mobile device has access to a complete framework. At least in my experience working with Java ME a year ago.

You can do exactly what you want with String.substring().
String str = "please truncate me after 13 characters!";
if (str.length() > 16)
str = str.substring(0, 13) + "..."

String foo = someString.substring(0, Math.min(13, someString.length()));
Edit: Just for general reference, as of Guava 16.0 you can do:
String truncated = Ascii.truncate(string, 16, "...");
to truncate at a max length of 16 characters with an ellipsis.
Aside
Note, though, that truncating a string for display by character isn't a good system for anything where i18n might need to be considered. There are (at least) a few different issues with it:
You may want to take word boundaries and/or whitespace into account to avoid truncating at an awkward place.
Splitting surrogate pairs (though this can be avoided just by checking if the character you want to truncate at is the first of a surrogate pair).
Splitting a character and a combining character that follows it (e.g. an e followed by a combining character that puts an accent on that e.)
The appearance of a character may change depending on the character that follows it in certain languages, so just truncating at that character will produce something that doesn't even look like the original.
For these reasons (and others), my understanding is that best practice for truncation for display in a UI is to actually fade out the rendering of the text at the correct point on the screen rather than truncating the underlying string.

Whenever there is some operation that you would think is a very common thing to do, yet the Java API requires you to check bounds, catch exceptions, use Math.min(), etc. (i.e. requires more work than you would expect), check Apache's commons-lang. It's almost always there in a more concise format. In this case, you would use StringUtils#substring which does the error case handling for you. Here's what it's javadoc says:
Gets a substring from the specified String avoiding exceptions.
A negative start position can be used to start n characters from the end of the String.
A null String will return null. An empty ("") String will return "".
StringUtils.substring(null, *) = null
StringUtils.substring("", *) = ""
StringUtils.substring("abc", 0) = "abc"
StringUtils.substring("abc", 2) = "c"
StringUtils.substring("abc", 4) = ""
StringUtils.substring("abc", -2) = "bc"
StringUtils.substring("abc", -4) = "abc"

String str = "This is Mobile application."
System.out.println(str.subSequence(0, 13)+"...");

Related

Resetting fancy font to normal

I have a String named fancy, the String fancy is this "𝖑𝖒𝖆𝖔", however, I need to make "lmao" out of it.
I've tried calling String#trim, however with no success.
Example code:
var fancy = "𝖑𝖒𝖆𝖔"
var normal = //Magic to convert 𝖑𝖒𝖆𝖔 to lmao
EDIT: So I figured out, if I take the UTF-8 code of this fancy character, and subtract it by 120101, I get the original character, however, there are more types of these fancy texts so it does not seem like a solution for my problem.
You can take advantage of the fact that your "𝖆" character decomposes to a regular "a":
Decomposition LATIN SMALL LETTER A (U+0061)
Java's java.text.Normalizer class contains different normalizer forms. The NKFD and NKFC forms use the above decomposition rule.
String normal = Normalizer.normalize(fancy, Normalizer.Form.NFKC);
Using compatibility equivalence is what you need here:
Compatibility equivalence is a weaker type of equivalence between characters or sequences of characters which represent the same abstract character (or sequence of abstract characters), but which may have distinct visual appearances or behaviors.
(The reason you do not lose diacritics is because this process simply separates these diacritic marks from their base letters - and then re-combines them if you use the relevant form.)
Those are unicode characters: https://unicode-table.com also provides reverse lookup to identify them (copy-paste them into the search).
The fancy characters identify as:
𝖑 Mathematical Bold Fraktur Small L (U+1D591)
𝖒 Mathematical Bold Fraktur Small M 'U+1D592)
𝖆 Mathematical Bold Fraktur Small A (U+1D586)
𝖔 Mathematical Bold Fraktur Small O (U+1D594)
You also find them as 'old style english alphabet' on this list: https://unicode-table.com/en/sets/fancy-letters. There we notice that they are ordered and in the same way that the alphabetic characters are. So the characters have a fixed offset:
int offset = 0x1D586 - 'a' // 𝖆 is U+1D586
You can thus transform the characters back by subtracting that offset.
Now comes the tricky part: these unicode code points cannot be represented by a single char data type, which is only 16 bit, and thus cannot represent every single unicode character on its own (1-4 chars are actually needed, depending on unicode char).
The proper way to deal with this is to work with the code points directly:
String fancy = "𝖑𝖒𝖆𝖔";
int offset = 0x1D586 - 'a' // 𝖆 is U+1D586
String plain = fancy.codePoints()
.map(i-> i - offset)
.mapToObj(c-> (char)c)
.map(String::valueOf)
.collect(java.util.stream.Collectors.joining());
System.out.println(plain);
This then prints lmao.

Escape character '\' doesn't show in System.out.println() but in return value

In Java, when I replace characters in a String with escaped-characters, the characters show up in the return value, although they were not there according to System.out.println.
String[][][] proCategorization(String[] pros, String[][] preferences) {
String str = "wehnquflkwe,wefwefw,wefwefw,wefwef";
String strReplaced = str.replace(",","\",\""); //replace , with ","
System.out.println(strReplaced);
The console output is: wehnquflkwe","wefwefw","wefwefw","wefwef
String[][][] array3d = new String[1][1][1]; // initialize 3d array
array3d[0][0][0] = strReplaced;
System.out.println(array3d[0][0][0]);
return array3d;
}
The console output is:
wehnquflkwe","wefwefw","wefwefw","wefwef
Now the return value is:
[[["wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"]]]
I don't understand why the \ show up in the return value but not in the System.out.println.
Characters in memory can be represented in different ways.
Your integrated development environment (IDE) has a debugger that chooses to represent a String[][][] with a single element that contains the characters
wehnquflkwe","wefwefw","wefwefw","wefwef
as a java-quoted string
"wehnquflkwe\",\"wefwefw\",\"wefwefw\",\"wefwef"
this makes a lot of sense, because you can then copy and paste this string into java code without any loss.
On the other hand, your system's console, and the IDE's built-in terminal emulator, will output the characters in their normal representation, that is, without any java string-escape-characters:
wehnquflkwe","wefwefw","wefwefw","wefwef
As an experiment, you may want to check what happens with other "special" characters, such as \t (a tab break) or \b (backspace). This is just the tip of the iceberg - characters in Java generally translate into unicode points, which may or may not be supported by the fonts available in your system or terminal. The IDE's way of representing characters as java-quoted strings allows it to losslessly represent pretty much anything; System.out.println's output is a lot more variable.
System.out.println prints the String exactly as it is stored in memory.
On the other hand, when you stop the application flow using a breakpoint you are able to look up the values.
Most of the IDEs display escape characters with \ to indicate that it's just one String, not String[] in this case, or not to split the String into two lines if it contains \n in the middle.
Just in case, you still have doubts, I suggest printing strReplaced.length(). This should allow you to count characters one by one.
Possible experiments:
String s = "my cute \n two line String";
System.out.println(s + " length is: " + s.length());

Regex to match if string *only* contains *all* characters from a character set, plus an optional one

I ran into a wee problem with Java regex. (I must say in advance, I'm not very experienced in either Java or regex.)
I have a string, and a set of three characters. I want to find out if the string is built from only these characters. Additionally (just to make it even more complicated), two of the characters must be in the string, while the third one is **optional*.
I do have a solution, my question is rather if anyone can offer anything better/nicer/more elegant, because this makes me cry blood when I look at it...
The set-up
There mandatory characters are: | (pipe) and - (dash).
The string in question should be built from a combination of these. They can be in any order, but both have to be in it.
The optional character is: : (colon).
The string can contain colons, but it does not have to. This is the only other character allowed, apart from the above two.
Any other characters are forbidden.
Expected results
Following strings should work/not work:
"------" = false
"||||" = false
"---|---" = true
"|||-|||" = true
"--|-|--|---|||-" = true
...and...
"----:|--|:::|---::|" = true
":::------:::---:---" = false
"|||:|:::::|" = false
"--:::---|:|---G---n" = false
...etc.
The "ugly" solution
Now, I have a solution that seems to work, based on this stackoverflow answer. The reason I'd like a better one will become obvious when you've recovered from seeing this:
if (string.matches("^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$") || string.matches("^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$")) {
//do funny stuff with a meaningless string
} else {
//don't do funny stuff with a meaningless string
}
Breaking it down
The first regex
"^[(?\\:)?\\|\\-]*(([\\|\\-][(?:\\:)?])|([(?:\\:)?][\\|\\-]))[(?\\:)?\\|\\-]*$"
checks for all three characters
The next one
"^[(?\\|)?\\-]*(([\\-][(?:\\|)?])|([(?:\\|)?][\\-]))[(?\\|)?\\-]*$"
check for the two mandatory ones only.
...Yea, I know...
But believe me I tried. Nothing else gave the desired result, but allowed through strings without the mandatory characters, etc.
The question is...
Does anyone know how to do it a simpler / more elegant way?
Bonus question: There is one thing I don't quite get in the regexes above (more than one, but this one bugs me the most):
As far as I understand(?) regular expressions, (?\\|)? should mean that the character | is either contained or not (unless I'm very much mistaken), still in the above setup it seems to enforce that character. This of course suits my purpose, but I cannot understand why it works that way.
So if anyone can explain, what I'm missing there, that'd be real great, besides, this I suspect holds the key to a simpler solution (checking for both mandatory and optional characters in one regex would be ideal.
Thank you all for reading (and suffering ) through my question, and even bigger thanks for those who reply. :)
PS
I did try stuff like ^[\\|\\-(?:\\:)?)]$, but that would not enforce all mandatory characters.
Use a lookahead based regex.
^(?=.*\\|)(?=.*-)[-:|]+$
or
^(?=.*\\|)[-:|]*-[-:|]*$
or
^[-:|]*(?:-:*\\||\\|:*-)[-:|]*$
DEMO 1DEMO 2
(?=.*\\|) expects atleast one pipe.
(?=.*-) expects atleast one hyphen.
[-:|]+ any char from the list one or more times.
$ End of the line.
Here is a simple answer:
(?=.*\|.*-|.*-.*\|)^([-|:]+)$
This says that the string needs to have a '-' followed by '|', or a '|' followed by a '-', via the look-ahead. Then the string only matches the allowed characters.
Demo: http://fiddle.re/1hnu96
Here is one without lookbefore and -hind.
^[-:|]*\\|[-:|]*-[-:|]*|[-:|]*-[-:|]*\\|[-:|]*$
This doesn't scale, so Avinash's solution is to be preferred - if your regex system has the lookbe*.

Safe sending String argument to JavaScript function from Java

My Java project based on WebView component.
Now, I want to call some JS function with single String argument.
To do this, I'm using simple code:
webEngine.executeScript("myFunc('" + str + "');");
*str text is getting from the texarea.
This solution works, but not safe enough.
Some times we can get netscape.javascript.JSException: SyntaxError: Unexpected EOF
So, how to handle str to avoid Exception?
Letfar's answer will work in most cases, but not all, and if you're doing this for security reasons, it's not sufficient. First, backslashes need to be escaped as well. Second, the line.separator property is the server side's EOL, which will only coincidentally be the same as the client side's, and you're already escaping the two possibilities, so the second line isn't necessary.
That all being said, there's no guarantee that some other control or non-ASCII character won't give some browser problems (for example, see the current Chrome nul in a URL bug), and browsers that don't recognize JavaScript (think things like screenreaders and other accessibility tools) might try to interpret HTML special characters as well, so I normally escape [^ -~] and [\'"&<>] (those are regular expression character ranges meaning all characters not between space and tilde inclusive; and backslash, single quote, double quote, ampersand, less than, greater than). Paranoid? A bit, but if str is a user entered string (or is calculated from a user entered string), you need to be a bit paranoid to avoid a security vulnerability.
Of course the real answer is to use some open source package to do the escaping, written by someone who knows security, or to use a framework that does it for you.
I have found this quick fix:
str = str.replace("'", "\\'");
str = str.replace(System.getProperty("line.separator"), "\\n");
str = str.replace("\n", "\\n");
str = str.replace("\r", "\\n");

Java string replace and the NUL (NULL, ASCII 0) character?

Testing out someone elses code, I noticed a few JSP pages printing funky non-ASCII characters. Taking a dip into the source I found this tidbit:
// remove any periods from first name e.g. Mr. John --> Mr John
firstName = firstName.trim().replace('.','\0');
Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a C-string. Would this be the culprit to the funky characters?
Does replacing a character in a String with a null character even work in Java? I know that '\0' will terminate a c-string.
That depends on how you define what is working. Does it replace all occurrences of the target character with '\0'? Absolutely!
String s = "food".replace('o', '\0');
System.out.println(s.indexOf('\0')); // "1"
System.out.println(s.indexOf('d')); // "3"
System.out.println(s.length()); // "4"
System.out.println(s.hashCode() == 'f'*31*31*31 + 'd'); // "true"
Everything seems to work fine to me! indexOf can find it, it counts as part of the length, and its value for hash code calculation is 0; everything is as specified by the JLS/API.
It DOESN'T work if you expect replacing a character with the null character would somehow remove that character from the string. Of course it doesn't work like that. A null character is still a character!
String s = Character.toString('\0');
System.out.println(s.length()); // "1"
assert s.charAt(0) == 0;
It also DOESN'T work if you expect the null character to terminate a string. It's evident from the snippets above, but it's also clearly specified in JLS (10.9. An Array of Characters is Not a String):
In the Java programming language, unlike C, an array of char is not a String, and neither a String nor an array of char is terminated by '\u0000' (the NUL character).
Would this be the culprit to the funky characters?
Now we're talking about an entirely different thing, i.e. how the string is rendered on screen. Truth is, even "Hello world!" will look funky if you use dingbats font. A unicode string may look funky in one locale but not the other. Even a properly rendered unicode string containing, say, Chinese characters, may still look funky to someone from, say, Greenland.
That said, the null character probably will look funky regardless; usually it's not a character that you want to display. That said, since null character is not the string terminator, Java is more than capable of handling it one way or another.
Now to address what we assume is the intended effect, i.e. remove all period from a string, the simplest solution is to use the replace(CharSequence, CharSequence) overload.
System.out.println("A.E.I.O.U".replace(".", "")); // AEIOU
The replaceAll solution is mentioned here too, but that works with regular expression, which is why you need to escape the dot meta character, and is likely to be slower.
Should be probably changed to
firstName = firstName.trim().replaceAll("\\.", "");
I think it should be the case. To erase the character, you should use replace(".", "") instead.
Does replacing a character in a String
with a null character even work in
Java?
No.
Would this be the culprit to the funky characters?
Quite likely.
This does cause "funky characters":
System.out.println( "Mr. Foo".trim().replace('.','\0'));
produces:
Mr[] Foo
in my Eclipse console, where the [] is shown as a square box. As others have posted, use String.replace().

Categories