I have the URL with the following query string
equipmentAccessoryRoute=LFVR+BASICACC
When I do request.getParameter("equipmentAccessoryRoute") it returns 'LFVR BASICACC' in a string variable, replacing plus sign with space.
To resolve this issue I did something like this
String accessoryRoute = java.net.URLEncoder.encode(request.getParameter("equipmentAccessoryRoute"),"UTF-8");
It worked perfectly but now yt is not working for the following query string (which was working before)
`equipmentAccessoryRoute=C1000IP5EL#-A`
Decoding converts this into 'C1000IP5EL%40-A' and stores into a string.
I am really confused. I tried to learn URL encoding but find it very hard to understand.
URL - Uniform Resource Locator
URLs can only be sent over the Internet using the ASCII
character-set. Since URLs often contain characters outside the ASCII
set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by
two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space
with a plus (+) sign or with %20.
Hope this helps.
Related
I have an API at the following path
/v0/segments/ch/abc/view/status/ACTIVE?sc=%s&expiryGteInMs=%d
I am building a Client using the URIBuilder in Java.
return UriBuilder
.fromUri(config.getHost())
.path(String.format(config.getPath(),request.List(), request.getTime()))
.build();
The request contains a list to be substituted in place of %s and the time to be substituted in place of %d. But the request being formed has a path like this
/v0/segments/ch/abc/view/status/ACTIVE%3Fsc=FK,GR&expiryGteInMs=1611081000000
Basically the "?" character is being replaced by %3F. Can somebody help me with this, please?
P.S: I know that we can use the ".queryParam" option available in URIBuilder. Looking for the actual reason why this is happening?
Most probably library that you are using is encoding url, and ? encodes to %3F.
Why this happens(in short): url could contain only specific set of character, and ? is not one of them, so, in order to transfer this character, we should encode it (so called Percent-encoding).
A bit longer explanation (taken from here):
URL encoding converts characters into a format that can be transmitted over the Internet.
URLs can only be sent over the Internet using the ASCII character-set.
Since URLs often contain characters outside the ASCII set, the URL has to be converted into a valid ASCII format.
URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.
URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.
I got an HTML file that looks like this:
<body>
<p>Hello! <b>[NAME]%</b></p>
</body>
And what I got in my Java file is that:
String name = "John";
My question is:
How do that fill John into the [Name]% in Java?
After doing so, how do I convert it to a base64-encoded string in Java?
Thank you for your help!
You are using a lot of characters that Java's regular-expression processor likes to haggle with. I would think that if you have programmed Java before for text-processing, then the String.replace(String, String); method would accomplish what you are attempting to do.
There are three String replace methods. Two of them, though, require regular-expressions. Regular-expressions would expect you to "escape" the brackets that you have typed.
Here is the text, copied from Oracle/Sun's Java documentation for: java.lang.String
String replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence.
String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular
expression with the given replacement.
String replaceFirst(String regex, String replacement)
Replaces the first substring of this string that matches the given
regular expression with the given replacement.
Just so you are aware - the two that say "regex" in the parameter-list would expect the regex String to follow this format for pattern-matching purposes:
// Regular-Expression Programming with java.lang.String - Several "Escaped" Characters!
// ALSO NOTE: Back-slashes need to be twice-escaped!
String replacePattern = "\\[NAME\\]%";
yourText.replaceFirst(replacePattern, "John");
These "back-slashes from hell" are required because the Regular Expressions Processor wants you to escape the '[' and the ']' because they are key-words (reserved/special characters) to the processor's system. Please review Regular Expressions in the Java 7/8/9 documentation to understand how String.replaceFirst and String.replaceAll work vis-a-vis the regex variable. Alternatively, if you use String.replace, all Java would expect is a direct character match, specifically:
yourText = yourText.replace("[NAME]%", "John");
Here is a link to Sun/Oracle's page on java.util.regex.Pattern:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
NOTE: Answer below is copied Google's Answer about base64 Encoding. I personally do not quite understand your question. Let me know if you are talking about UTF-8? UniCode? What do you mean by a "Base64 encoded String"?
What is the use of base64 encoding in Java? Encodes the specified byte array into a String using the Base64 encoding scheme. Returns an
encoder instance that encodes equivalently to this one, but without
adding any padding character at the end of the encoded byte data.
Wraps an output stream for encoding byte data using the Base64
encoding scheme.
What is base64 encoding in Java?
Base64 is a binary-to-text encoding scheme that represents binary data in a printable ASCII string format by translating it into a radix-64 representation. Each Base64 digit represents exactly 6 bits of binary data.Dec 6, 2017
Here is a link to Sun's Page on the issue:
https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html
I have a URL that looks like this:
Liberty%21%20ft.%20Whiskey%20Pete%20-%20Thunderfist%20%28Original%20Mix%29.mp3
I'm trying to extract just the words from it. Right now, I'm using string.replace("%21", "!") for each and every %20, %29, etc. because each segment represent different characters or spaces. Is there a way to just covert those symbols and numbers to what they actually mean?
Thanks.
Those symbols are URLEncoded representations of characters that can't legally exist in a URL. (%20 = a single space, etc)
You need to UrlDecode those strings:
http://icfun.blogspot.com/2009/08/java-urlencode-and-urldecode-options.html
Official documentation here:
http://download.oracle.com/javase/6/docs/api/java/net/URLDecoder.html
It seems the input string is written using the URL encoding. Instead of writing all possible replacements manually (you can hardly cover all possibilities), you can use URLDecoder class in Java.
String input = "Liberty%21%20ft.%20Whiskey%20Pete...";
String decoded = URLDecoder.decode(input, "UTF-8");
I have to use HttpClient 2.0 (can not use anything newer), and I am running into the next issue. When I use the method (post, in that case), it "codify" the parameters to the Hexadecimal ASCII code, and the "spaces" turned into "+" (something that the receiver don't want).
Does anyone know a way to avoid it?
Thanks a lot.
Even your browser does that, converting space character into +. See here http://download.oracle.com/javase/1.5.0/docs/api/java/net/URLEncoder.html
It encodes URL, converts to UTF-8 like string.
When encoding a String, the following rules apply:
The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
Also, see here http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., `%0D%0A').
The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by&'.
To answer your question, if you do not want to encode. I guess, URLDecoder.decode will help you to undo the encoded string.
You could in theory avoid this by constructing the query string or request body containing parameters by hand.
But this would be a bad thing to do, because the HTML, HTTP, URL and URI specs all mandate that reserved characters in request parameters are encoded. And if you violate this, you may find that server-side HTTP stacks, proxies and so on reject your requests as invalid, or misbehave in other ways.
The correct way to deal with this issue is to do one of the following:
If the server is implemented in Java EE technology, use the relevant servlet API methods (e.g. ServletRequest.getParam(...)) to fetch the request parameters. These will take care of any decoding for you.
If the parameters are part of a URL query string, you can instantiate a Java URL or URI object and use the getter to return you the query with the encoding removed.
If your server is implemented some other way (or if you need to unpick the request URL's query string or POST data yourself), then use URLDecoder.decode or equivalent to remove the % encoding and replace +'s ... after you have figured out where the query and parameter boundaries, etc are.
I am working with apache http client 4 for all of my web accesses.
This means that every query that I need to do has to pass the URI syntax checks.
One of the sites that I am trying to access uses UNICODE as the url GET params encoding, i.e:
http://maya.tase.co.il/bursa/index.asp?http://maya.tase.co.il/bursa/index.asp?view=search&company_group=147&srh_txt=%u05E0%u05D9%u05D1&arg_comp=&srh_from=2009-06-01&srh_until=2010-02-16&srh_anaf=-1&srh_event=9999&is_urgent=0&srh_company_press=
(the param "srh_txt=%u05E0%u05D9%u05D1" encodes srh_txt=ניב in UNICODE)
The problem is that URI doesn't support UNICODE encoding(it only supports UTF-8)
The really big issue here, is that this site expect it's params to be encoded in UNICODE, so any attempts to convert the url using String.format("http://...srh_txt=%s&...",URLEncoder.encode( "ניב" , "UTF8"))
results in a url which is legal and can be used to construct a URI but the site response to it with an error message, since it's not the encoding that it expects.
by the way URL object can be created and even used to connect to the web site using the non converted url.
Is there any way of creating URI in non UTF-8 encoding?
Is there any way of working with apache httpclient 4 with regular URL(and not URI)?
thanks,
Niv
(the param "srh_txt=%u05E0%u05D9%u05D1" encodes srh_txt=ניב in UNICODE)
It doesn't really. That's not URL-encoding and the sequence %u is invalid in a URL.
%u05E0%u05D9%u05D1" encodes ניב only in JavaScript's oddball escape syntax. escape is the same as URL-encoding for all ASCII characters except for +, but the %u#### escapes it produces for Unicode characters are completely of its own invention.
(One should, in general, never use escape. Using encodeURIComponent instead produces the correct URL-encoded UTF-8, ניב=%D7%A0%D7%99%D7%91.)
If a site requires %u#### sequences in its query string, it is very badly broken.
Is there any way of creating URI in non UTF-8 encoding?
Yes, URIs may use any character encoding you like. It is conventionally UTF-8; that's what IRI requires and what browsers will usually submit if the user types non-ASCII characters into the address bar, but URI itself concerns itself only with bytes.
So you could convert ניב to %F0%E9%E1. There would be no way for the web app to tell that those bytes represented characters encoded in code page 1255 (Hebrew, similar to ISO-8859-8). But it does appear to work, on the link above, which the UTF-8 version does not. Oh dear!