I am using java's URLEncoder to take a user provided string and create a string that is safe to use for filenames. What I'm wondering is it possible for two different strings to be encoded to the same value.
For example, if one string is "ABC%20D" but since % is used as a character to replace special characters is it possible that something like "ABC D" and "ABC%20D" both end up as the same encoded value? Or will the encoder always replace characters like % with something else?
It seems to encode escape characters using your example input:
String result = URLEncoder.encode("ABC%20D", "UTF-8");
System.out.println(result); //prints ABC%2520D
Related
I got an HTML file that looks like this:
<body>
<p>Hello! <b>[NAME]%</b></p>
</body>
And what I got in my Java file is that:
String name = "John";
My question is:
How do that fill John into the [Name]% in Java?
After doing so, how do I convert it to a base64-encoded string in Java?
Thank you for your help!
You are using a lot of characters that Java's regular-expression processor likes to haggle with. I would think that if you have programmed Java before for text-processing, then the String.replace(String, String); method would accomplish what you are attempting to do.
There are three String replace methods. Two of them, though, require regular-expressions. Regular-expressions would expect you to "escape" the brackets that you have typed.
Here is the text, copied from Oracle/Sun's Java documentation for: java.lang.String
String replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular
expression with the given replacement.
String replaceFirst(String regex, String replacement)
Replaces the first substring of this string that matches the given
regular expression with the given replacement.
Just so you are aware - the two that say "regex" in the parameter-list would expect the regex String to follow this format for pattern-matching purposes:
// Regular-Expression Programming with java.lang.String - Several "Escaped" Characters!
// ALSO NOTE: Back-slashes need to be twice-escaped!
String replacePattern = "\\[NAME\\]%";
yourText.replaceFirst(replacePattern, "John");
These "back-slashes from hell" are required because the Regular Expressions Processor wants you to escape the '[' and the ']' because they are key-words (reserved/special characters) to the processor's system. Please review Regular Expressions in the Java 7/8/9 documentation to understand how String.replaceFirst and String.replaceAll work vis-a-vis the regex variable. Alternatively, if you use String.replace, all Java would expect is a direct character match, specifically:
yourText = yourText.replace("[NAME]%", "John");
Here is a link to Sun/Oracle's page on java.util.regex.Pattern:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
NOTE: Answer below is copied Google's Answer about base64 Encoding. I personally do not quite understand your question. Let me know if you are talking about UTF-8? UniCode? What do you mean by a "Base64 encoded String"?
What is the use of base64 encoding in Java? Encodes the specified byte array into a String using the Base64 encoding scheme. Returns an
encoder instance that encodes equivalently to this one, but without
adding any padding character at the end of the encoded byte data.
Wraps an output stream for encoding byte data using the Base64
encoding scheme.
What is base64 encoding in Java?
Base64 is a binary-to-text encoding scheme that represents binary data in a printable ASCII string format by translating it into a radix-64 representation. Each Base64 digit represents exactly 6 bits of binary data.Dec 6, 2017
Here is a link to Sun's Page on the issue:
https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html
I am writing an app for SmartThings (www.smartthings.com) in their own IDE. I have an input field here that is supposed to be text input. I ask for a departure address:
section("Departing From:"){
input "departFrom", "text", title: "Address?"
}
when putting in the value of Monterey, CA the value magically gets changed to a JSON Array with the values of [Monterey, CA]
I want to pass this value to an httpGET statement but I need to URLencode it first to omit spaces, etc.I have tried URLencoder with no success due to the JSON array.
I have tried join(",") with no luck as it adds double quotes to the value.
How can I get a clean Monterey%2C%20CA URL encoded value from this variable?
** bear in mind someone could input any combination of numbers, spaces, and commas into this input as an address. The mapquest API I am sending it to can handle all these things as long as they dont have special characters and spaces are URL encoded.
Maybe try:
def l = ['Monterey', 'CA']
assert URLEncoder.encode(l.join(', ')).replaceAll('\\+','%20') == 'Monterey%2C%20CA'
When it comes to replacing + sign, please see here
There are different types of URL encoding, but in this case there are two: One that converts spaces to %20 and one that converts spaces to +.
For the first, you'd use UriUtils:
def yourEncodedString = UriUtils.encodeUri(yourString.toString(), "UTF-8")
For the second, you'd use UrlEncoder:
def yourEncodedString = URLEncoder.encode(yourString.toString(), "UTF-8")
Alternatively (I think) you can use URLEncoder with UTF-16 to get what you want.
I've never had a fun time with UriUtils, so hopefully UrlEncoder will work for you.
I have a string like this
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
when i put it in browser console, it automatically becomes something else:
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
"'ö,úìHL"
if I do chatAt(x) over this string, I get:
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(0)
"'"
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(1)
""
"\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL".charAt(2)
"ö"
which IS what I want.
Now I want to implement a Java program that reads the string the same way as in browser.
The problem is, Java does not recognize the way this string is encoded. Instead, it treats it as a normal string:
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(0) == '\'
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(1) == 'x'
"\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".charAt(2) == '2'
What kind of encoding this string is encoded? What kind of encoding uses prefix \x?
Is there a way to read it properly (get the same result as in browser)?
update: I found a solution -> i guess it is not the best, but it works for me:
StringEscapeUtils.unescapeJava("\\x27\\x18\\xf6,\\x03\\x12\\x8e\\xfa\\xec\\x11\\x0dHL".replace("\\x", "\\u00"))
thank you all for your replies :)
especially Ricardo Cacheira
Thank you
\x03 is the ASCII hexadecimal value of char
so this: "\x30\x31" is the same as : "01"
see that page: http://www.asciitable.com
Another thing is when you copy your string without quotation marks your IDE converts any \ to \\
Java String uses unicode escape so this: "\x30\0x31" in java is: "\u0030\u0031";
you can't use these escape sequence in Java String \u000a AND \u000d you should convert it respectively to \r AND \n
So this "\u0027\u0018\u00f6,\u0003\u0012\u008e\u00fa\u00ec\u0011\rHL" is the conversion for Java of this: "\x27\x18\xf6,\x03\x12\x8e\xfa\xec\x11\x0dHL"
apache commons provides a helper for this:
StringEscapeUtils.unescapeJava(...)
Unescapes any Java literals found in the String. For example, it will turn a sequence of '\' and 'n' into a newline character, unless the '\' is preceded by another '\'.
I have a URL that looks like this:
Liberty%21%20ft.%20Whiskey%20Pete%20-%20Thunderfist%20%28Original%20Mix%29.mp3
I'm trying to extract just the words from it. Right now, I'm using string.replace("%21", "!") for each and every %20, %29, etc. because each segment represent different characters or spaces. Is there a way to just covert those symbols and numbers to what they actually mean?
Thanks.
Those symbols are URLEncoded representations of characters that can't legally exist in a URL. (%20 = a single space, etc)
You need to UrlDecode those strings:
http://icfun.blogspot.com/2009/08/java-urlencode-and-urldecode-options.html
Official documentation here:
http://download.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
It seems the input string is written using the URL encoding. Instead of writing all possible replacements manually (you can hardly cover all possibilities), you can use URLDecoder class in Java.
String input = "Liberty%21%20ft.%20Whiskey%20Pete...";
String decoded = URLDecoder.decode(input, "UTF-8");
I just came across something like this:
String sample = "somejunk+%3cfoobar%3e+morestuff";
Printed out, sample looks like this:
somejunk+<foobar>+morestuff
How does that work? U+003c and U+003e are the Unicode codes for the less than and greater than signs, respectively, which seems like more than a coincidence, but I've never heard of Java automatically doing something like this. I figured it'd be an easy thing to pop into Google, but it turns out Google doesn't like the percent sign.
That string is probably URL encoded You'd decode that in java using the URLDecoder
String res = java.net.URLDecoder.decode(sample, "UTF8");
You can do something like this,
String sample = "somejunk+%3cfoobar%3e+morestuff";
String result = URLDecoder.decode(sample.replaceAll("\\+", "%2B"), "UTF8");
Java does support Unicode escapes in char and String literals, but not URL encoding.
The Unicode escapes use '\uXXXX', where XXXX is the Unicode point in hexadecimal.
Curious tidbit: The grammar allows 'u' to occur multiple times, so that '\uuuuuuuu0041' is a valid Unicode escape (for 'A').