Java doesn't recognize my unicode emoji (error: unknown emoji)

Java doesn't recognize my unicode emoji (error: unknown emoji) - java

I have a problem with Strings and Unicode in Java.
I'm currently working on a bot for Discord and have to pass it a string with an emoji. For this I use the java-specific form, i.e. I want to have the emoji "fire", for example. If I manually set my method for java-specific code (\uD83D \uDD25) in a string, it works, but if I use the return value (also a string) there, the whole thing no longer works.
Hence the question of whether it makes a difference if the java-specific code is entered manually and if it is entered automatically. Maybe java can't recognize that the second one is also an unicode?
Thanks for your help
String emoji1 = "\uD83D\uDD25";
String emoji2 = convertToJava(":fire:"); //return a String with the content "\uD83D\uDD25"
msg.addReaction(ReactionEmoji.of(id, emoji1, isAnimated)).block(); //this is working
msg.addReaction(ReactionEmoji.of(id, emoji2, isAnimated)).block(); //this returns me an error called "unknown emoji"

Related

Encoding string doesn't work properly in java

I am developing a JavaFX application. I need to create a TreeView programmatically using Persian language as it's nodes' name.
The problem is I see strange characters when I run the application. I have searched through the web and SO same questions. I code a function to do the encoding based on the answers to same question:
public static String getUTF(String encodeString) {
return new String(encodeString.getBytes(StandardCharsets.ISO_8859_1),
StandardCharsets.UTF_8);
}
And I use it to convert my string to build the TreeView:
CheckBoxTreeItem<String> userManagement =
new CheckBoxTreeItem<>(GlobalItems.getUTF("کاربران"));
This answer dowsn't work properly for some characters:
I still get strange results. If I don't use encoding, I get:

For hard coded string literals you need to tell the javac compiler to use the same encoding as the java source, say UTF-8. Check the IDE / build settings. You can u-escape some Farsi symbols,
\u062f for Dal, د. If the escaped characters come thru correctly, the compiler uses the wrong encoding.
String will always contain Unicode, no new Strings with hacking reconversion needed.
Reading files with text, one needs to convert those bytes (byte/InputStream) to java text (String/Reader) specifying the encoding of those bytes.

Java - Determine difference between user entering \n and pressing enter in html form

Searched the web for this but not sure I'm asking the question correctly. I have a web form with a textarea. Users can type what ever they want (can paste emails, etc). When they submit, I escape things like newline so that when I store in a PostgreSQL db (json column type) it saves correctly. That all works fine. However, if a user type something like c:\foo\bar\notworking.txt the \n is treated like a new line so I end up with
c:\foo\bar
otworking.txt
If I look at the string (user hits enter) coming into the controller (Spring based) I see \n.
Question is, how do I differentiate between someone typing \n and hitting the enter key?

easiest solution:
String s = ...;
s = s.replaceAll("\\","\\\\");
Then the opposite after you load it back in

In order to insert raw text into a JSON column, you need to encode the text as JSON, meaning:
The following fails «badly»:
"c:\foo\bar\notworking.txt"
encodes to (assuming non-ASCII needs encoding, which it does if the DB is not UTF-8):
"The following fails \u00ABbadly\u00BB:\r\n \"c:\\foo\\bar\\notworking.txt\""
Side note: As a Java String literal, that would be:
String json = "\"The following fails \\u00ABbadly\\u00BB:\\r\\n \\\"c:\\\\foo\\\\bar\\\\notworking.txt\\\"\"";
You will then of course use PreparedStatement or Spring, such that SQL escaping and SQL Injection issues are non-existent.

Remove ASCII symbol from String [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I need to get rid of a character that looks exactly like the male ascii symbol from text - ♂. However, it's not the standard ASCII symbol, because if I paste it on StackExchange, it's displayed as indicated below:
How can I replace the character within a String? I've tried pasting the character directly into Eclipse but unfortunately that doesn't work (it looks exactly like the image above when pasted into Eclipse). You can see the symbol in Notepad++ however when using the search function:
Howevever, when displayed inline, it looks like this:
Edit: #Greg-449's answer, I've tried that but the character still remains in the String. I don't think it's the default character. I'll show you where you can reference it from a website:
Thermaltake: Chassis > Versa > Versa H21
If you highlight the specifications & choose View selection source you'll notice it start appearing on line 63 after the word (optional).
How can I remove this symbol from the String? If at all possible, is there a way to exclude strange symbols like that in general?
Edit 2. After trying both suggested answers, I'm still not able to remove it from the String. A critical part I now see that I may have left out is that the text is copied from the website, into Microsoft Excel, then into a Java Applet (TextArea) where it is analyzed & manipulated from. Even though not visible in the text area, it still remains there when copied back into Excel after being manipulated.
Code tested is:
String descript = textArea.getText();
descript = descript.replace('\u000B', ' ');
textArea.setText(descript);
When taking this text back into Excel, the character remains.

This is a Unicode symbol so to paste it directly you need to be editing a file with a suitable encoding such as UTF-8 and you need to be using a font that can display the symbol.
In a Java string you can always use the Unicode escape to represent the character. The male symbol is Unicode U+2642 so the string would be:
"\u2642"
Update: Looking at the web site you reference the character is actually a 'vertical tab (VT)' character, Unicode U+000B which explains the 'VT' to see 'displayed inline'. You can use
"\u000B"
for this.
Use something like
String newString = oldString.replace('\u000B', ' ');
to get a new string with the VTs replaced by blanks.

The VT ("vertical tab") character is actually the ASCII character 11, or 0x0b. So it appears that this character is just displayed in a non-standard (neither ASCII nor Unicode) way by some tools.
Knowing that you're looking for the ASCII code 11, you could do char maleChar = (char)11; or String maleStr = "" + ((char)11); and then do your replacement operations based on that.
If, o.t.o.h., the data you have in your string is acutally binary data read for example from a stream, you'd probably be better off using a byte[] or int[] array in the first place.

Java XML string literals abruptly terminated

I have some Java code that looks like this:
String xml = "<string>" + escapeXml(input) + "</string>";
protected String escapeXml(String input) {
return input.replaceAll("&", "&")
.replaceAll("'", "&apos;")
.replaceAll("\"", """)
.replaceAll("<", "<")
.replaceAll(">", ">")
}
input is a variable UTF-8 encoded string.
What I'm finding is that in some cases the xml string ends up being equal to <string> without the enclosing </string>. Why might this be? Is it possible for Java to evaluate escapeXml into something that truncates the string before </string> can be appended to it?
UPDATE: In response to Sotirios, let me add some clarifications. The xml string is being saved to a SQLite database column, which in turn is parsed by another utility. So far, I've noticed that this behavior occurs when the xml string saved to the database is either <string> or <string> with some non-ASCII Unicode character afterwards.
input is being fed automatically from a hook into an Android function. Because everything is running on Android in a non-standard configuration, it's a bit difficult to debug to learn exactly what's going on. I was hoping that there might be some obvious answer involving Java strings.

I never got to the bottom of this, but I did fix my problem by modifying the escapeXml function to use a proper XML encoder (org.apache.commons.lang library). I don't see how that would make a difference, but it did, and now the xml string is properly constructed.

Replace multiple backslashes with HTML entities in Java string for JSP generated Json

after hours of googeling and searching within SO, I finaly come to the place where I need to ask you! :)
Situation is the following:
A webservice delivers data in a CDATA. This data is parsed and put into our model. Using Spring MVC we access the model inside the JSP files to create....here come the point... JSON! Don't ask, historically! ;-)
Now, somehow someone came to the glorious idea to put multiple (back)slashes into a title property. The getTitle() method returns the string "/// Glasvegas \\". This of course doesn't work, if we do a JavaSCript eval() on the JSON (created within the JSP) to get the JavaScript Json object. It simply interprets the backslashes as comment, making the Json invalid.
I tried to use the escapeHtml() methods from apache.common and springframework, but they both just ignore the backslashes while encoding all other special chars correctly.
Then I tried to write my own method:
public static String escapeHTML(String string) {
String foreslash="\";
String regex="\\\\";
System.out.println(string.replaceAll(regex,foreslash));
string.replaceAll(regex,foreslash);
return string;
}
In console output the string is correctly replaced, but if break at the return and inspect the variable 'string' in the debugger it's still "/// Glasvegas \\". Also the same in the generated JSP.
So, I'm kind of lost here.
Regards,
ASP

strings are immutable. the name of the method "replaceAll" makes it sound as though you're actually modifying the string object itself, but you're not. the method just returns the result of the operation. this is why you get the correct output from the System.out.println. but then you make an error of thought, thinking that just because the call is standing by itself, not inside a System.out.println, the java code should understand by itself that this time you want the change to be permanent in the string object ;)
try to rewrite the end of your method like this:
System.out.println(string.replaceAll(regex,foreslash));
return string.replaceAll(regex,foreslash);
also, the virable name "foreslash" makes it sound as though 92 is the code for a forward slash. maybe it is, i don't know. your regular expression then looks for a backslash. that's a bit confusing!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.