I got an HTML file that looks like this:
<body>
<p>Hello! <b>[NAME]%</b></p>
</body>
And what I got in my Java file is that:
String name = "John";
My question is:
How do that fill John into the [Name]% in Java?
After doing so, how do I convert it to a base64-encoded string in Java?
Thank you for your help!
You are using a lot of characters that Java's regular-expression processor likes to haggle with. I would think that if you have programmed Java before for text-processing, then the String.replace(String, String); method would accomplish what you are attempting to do.
There are three String replace methods. Two of them, though, require regular-expressions. Regular-expressions would expect you to "escape" the brackets that you have typed.
Here is the text, copied from Oracle/Sun's Java documentation for: java.lang.String
String replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular
expression with the given replacement.
String replaceFirst(String regex, String replacement)
Replaces the first substring of this string that matches the given
regular expression with the given replacement.
Just so you are aware - the two that say "regex" in the parameter-list would expect the regex String to follow this format for pattern-matching purposes:
// Regular-Expression Programming with java.lang.String - Several "Escaped" Characters!
// ALSO NOTE: Back-slashes need to be twice-escaped!
String replacePattern = "\\[NAME\\]%";
yourText.replaceFirst(replacePattern, "John");
These "back-slashes from hell" are required because the Regular Expressions Processor wants you to escape the '[' and the ']' because they are key-words (reserved/special characters) to the processor's system. Please review Regular Expressions in the Java 7/8/9 documentation to understand how String.replaceFirst and String.replaceAll work vis-a-vis the regex variable. Alternatively, if you use String.replace, all Java would expect is a direct character match, specifically:
yourText = yourText.replace("[NAME]%", "John");
Here is a link to Sun/Oracle's page on java.util.regex.Pattern:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
NOTE: Answer below is copied Google's Answer about base64 Encoding. I personally do not quite understand your question. Let me know if you are talking about UTF-8? UniCode? What do you mean by a "Base64 encoded String"?
What is the use of base64 encoding in Java? Encodes the specified byte array into a String using the Base64 encoding scheme. Returns an
encoder instance that encodes equivalently to this one, but without
adding any padding character at the end of the encoded byte data.
Wraps an output stream for encoding byte data using the Base64
encoding scheme.
What is base64 encoding in Java?
Base64 is a binary-to-text encoding scheme that represents binary data in a printable ASCII string format by translating it into a radix-64 representation. Each Base64 digit represents exactly 6 bits of binary data.Dec 6, 2017
Here is a link to Sun's Page on the issue:
https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html
Related
I'm doing some Freebase queries. Sometimes the result of the query contains Unicode characters. How could I convert those characters into a Java String? (e.g., The_Police_$0028band$0029 → The_Police_(band)). I've tried:
new String(arg_in_byte,"UTF-8")
but it doesn't work. I saw in another question that one solution is the method replaceAll but I think that there is some other method that will be cleaner.
Those aren't UTF-8 encoded, but rather private encoding of Unicode codepoints. If your Java client library for Freebase doesn't include the necessary decoding method, you'll need to write one yourself to take the four digits after the dollar sign ($), interpret them as a hexadecimal integer and then convert that to a Java character (which also uses Unicode code points internally).
Here is some documentation on the encoding:
http://wiki.freebase.com/wiki/MQL_key_escaping
I have a string like this
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
when i put it in browser console, it automatically becomes something else:
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
"'ö,úìHL"
if I do chatAt(x) over this string, I get:
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(0)
"'"
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(1)
""
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(2)
"ö"
which IS what I want.
Now I want to implement a Java program that reads the string the same way as in browser.
The problem is, Java does not recognize the way this string is encoded. Instead, it treats it as a normal string:
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(0) == '\'
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(1) == 'x'
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(2) == '2'
What kind of encoding this string is encoded? What kind of encoding uses prefix \x?
Is there a way to read it properly (get the same result as in browser)?
update: I found a solution -> i guess it is not the best, but it works for me:
StringEscapeUtils.unescapeJava("\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".replace("\\x", "\\u00"))
thank you all for your replies :)
especially Ricardo Cacheira
Thank you
\x03 is the ASCII hexadecimal value of char
so this: "\x30\x31" is the same as : "01"
see that page: http://www.asciitable.com
Another thing is when you copy your string without quotation marks your IDE converts any \ to \\
Java String uses unicode escape so this: "\x30\0x31" in java is: "\u0030\u0031";
you can't use these escape sequence in Java String \u000a AND \u000d you should convert it respectively to \r AND \n
So this "\u0027\u0018\u00f6,\u0003\u0012\u008e\u00fa\u00ec\u0011\rHL" is the conversion for Java of this: "\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
apache commons provides a helper for this:
StringEscapeUtils.unescapeJava(...)
Unescapes any Java literals found in the String. For example, it will turn a sequence of '\' and 'n' into a newline character, unless the '\' is preceded by another '\'.
So I'm currently using the commons lang apache library.
When I tried unescaping this string: 😀
This returns the same string: 😀
String characters = "😀"
StringEscapeUtils.unescapeHtml(characters);
Output: 😀
But when I tried unescaping a String with a less few characters, it works:
String characters = "㈳"
StringEscapeUtils.unescapeHtml(characters);
Output: ㈳
Any ideas? When I tried unescaping this String "😀" on online unescaping utility, it works, so maybe it's a bug in the apache common langs library? Or can anyone recommend another library?
Thanks.
UPDATES:
I'm now able to unescape the String successfully. The problem now is when I tried to escaped the result of that unescape, it won't bring back the String (😀).
unescapeHtml() leaves 😀 untouched because – as the documentation says – it only unescapes HTML 4.0 entities, which are limited to 65,536 characters. Unfortunately, 128,512 is far beyond that limit.
Have you tried using unescapeXml()?
XML supports up to 1,114,111 (10FFFFh) character entities (link).
This is a unicode character whose index is U+1F600 (128512) - GRINNING FACE
Refer the URL for details
The String you have mentioned is HTML Escape of U+1F600, If you unescape it using Apache commons lang it will draw you the required smiley as provided in screenshot
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
Regarding your update that its not converting back to 😀
You can also represent a character using a Numeric Character Reference, of the form &#dddd;, where dddd is the decimal value representing the character's Unicode scalar value. You can alternatively use a hexadecimal representation &#xhhhh;, where hhhh is the hexadecimal value equivalent to the decimal value.
A good site for this
Have added few SoP to help you understand this unicode better.
Well - the solution is pretty easy:
use org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4 instead! (unless you're using Java <1.5, which you probably won't)
String characters = "😀";
StringEscapeUtils.unescapeHtml4(characters);
i think the problem is that there is no unicode character "😀"
so the method simply returns this string.
the doc of the function says only
Returns: a new unescaped String, null if null string input
If it's a HTML specific question, then you can just use JavaScript for this purpose.
You can do
escape("😀") which gives you %26%23128512%3B
unescape("%26%23128512%3B") which gives you back 😀
In XML, if one character unicode is written as \ue123 in Java
how can a string of two characters be written ?
note I tried \u123\u123 but it didn't work !
Well \u123\u123 doesn't work because \u needs to be followed by four hex digits. But this should work fine:
String text = "\u0123\u0123";
Note that this is just the Java string literal side - it has nothing to do with XML. XML has different ways of escaping the characters it needs to, but if you use an appropriate encoding (e.g. UTF-8) you shouldn't need to escape non-ASCII characters.
I have a Java String like this: "peque\u00f1o". Note that it has an embedded Unicode character: '\u00f1'.
Is there a method in Java that will replace these Unicode character sequences with the actual characters? That is, a method that would return "pequeño" if you gave it "peque\u00f1o" as input?
Note that I have a string that has 12 chars (those that we see, that happen to be in the ASCII range).
Actually the string is "pequeño".
String s = "peque\u00f1o";
System.out.println(s.length());
System.out.println(s);
yields
7
pequeño
i.e. seven chars and the correct representation on System.out.
I remember giving the same response last week, use org.apache.commons.lang.StringEscapeUtils.
If you have the appropriate fonts, a println or setting the string in a JLabel or JTextArea should do the trick. The escaping is only for the compiler.
If you plan to copy-paste the readable strings in source, remember to also choose a suitable file encoding like UTF8.