Java : File.toURI().toURL() on Windows file - java

The system I'm running on is Windows XP, with JRE 1.6.
I do this :
public static void main(String[] args) {
try {
System.out.println(new File("C:\\test a.xml").toURI().toURL());
} catch (Exception e) {
e.printStackTrace();
}
}
and I get this : file:/C:/test%20a.xml
How come the given URL doesn't have two slashes before the C: ? I expected file://C:.... Is it normal behaviour?
EDIT :
From Java source code : java.net.URLStreamHandler.toExternalForm(URL)
result.append(":");
if (u.getAuthority() != null && u.getAuthority().length() > 0) {
result.append("//");
result.append(u.getAuthority());
}
It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol?

That's an interesting question.
First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.
RFC2396 does not actually require two slashes. According to section 3:
The URI syntax is dependent upon the
scheme. In general, absolute URI are
written as follows:
<scheme>:<scheme-specific-part>
Having said that, RFC2396 has been superseded by RFC3986, which states
The generic URI syntax consists of a
hierarchical sequence of omponents
referred to as the scheme, authority,
path, query, and fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
The scheme and path components are
required, though the path may be empty
(no characters). When authority is
present, the path must either be empty
or begin with a slash ("/") character.
When authority is not present, the
path cannot begin with two slash
characters ("//"). These restrictions
result in five different ABNF rules
for a path (Section 3.3), only one of
which will match any given URI
reference.
So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.
However, that RFC didn't come around until 2005, and Java references RFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.

To answer why you can have both:
file:/path/file
file:///path/file
file://localhost/path/file
RFC3986 (3.2.2. Host) states:
"If the URI scheme defines a default for host, then that default applies when the host subcomponent is undefined or when the registered name is empty (zero length). For example, the "file" URI scheme is defined so that no authority, an empty host, and "localhost" all mean the end-user's machine, whereas the "http" scheme considers a missing authority or empty host invalid."
So the "file" scheme translates file:///path/file to have a context of the end-user's machine even though the authority is an empty host.

As far as using it in a browser is concerned, it doesn't matter. I have typically seen file:///... but one, two or three '/' will all work. This makes me think (without looking at the java documentation) that it would be normal behavior.

Related

URI.getHost() returns null

This prints null:
System.out.println(new URI("http://a.1a/").getHost());
But this prints a.1a:
System.out.println(new URL("http://a.1a/").getHost());
If all URLs are URIs (but not all URIs are URLs) shouldn't a valid URL that has a host component also have the same host component (instead of null) as a URI?
Look at the Javadoc:
https://docs.oracle.com/javase/8/docs/api/java/net/URI.html:
Returns: The host component of this URI, or null if the host is undefined"
OK, so why is the host part of your particular URI ("http://a.1a/") undefined? Look at the RFC:
https://www.ietf.org/rfc/rfc2396.txt
Hostnames take the form described in Section 3 of [RFC1034] and
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
".", each domain label starting and ending with an alphanumeric
character and possibly also containing "-" characters. The rightmost
domain label of a fully qualified domain name will never start > with a digit... To actually be "Uniform" as a resource
locator, a URL hostname should be a fully qualified domain name.

How do I parse an HttpExchange request after an ending slash?

I have a request as follows:
localhost:8000/location/:01
My code takes as input an HttpContext request.
func(HttpExchange r) {
String area_path = r.getRequestURI(); // Equals string "/location/"
}
How do I parse an HttpExchange correctly so I can pull out the "01" from this path and store it as a variable?
That (localhost:8000/location/:01) is not a valid URL or URI
A plain colon character is not legal in the path of a URL or URI. If you want to put a colon in the path, it must be percent-encoded. Furthermore, if this was a URL, it would start with a protocol; e.g. http:.
Now ... it is unclear what the HTTP stack you are using will do with a syntactically incorrect URL / URI, but it could simply be ignoring the colon and the characters after it.
Your code looks a bit odd too. You have tagged the question as [java]. But the code looks like JavaScript rather than Java; i.e. func is a Javascript keyword. But it also looks like you are using the (deprecated) com.sun.net.httpserver.HttpExchange Java class. I don't know what to make of that ...
My advice:
Don't use a colon character in the URL path.
If you must do it, then percent-encode the colon it.
If you cannot encode it properly, then you may need to find and use a different framework for your HTTP request handling. One that will accept and handle a malformed URL / URI in the way that you want. (Good luck finding one!)
Unfortunately, the details in your question are too sketchy to give more detailed advice.

Is a URI containing a comma valid in a HTTP Link header?

Is the following HTTP Link header, containing a comma, valid?
Link: <http://www.example.com/foo,bar.html>; rel="canonical"
RFC5988 says:
Note that extension relation types are REQUIRED to be absolute URIs in
Link headers, and MUST be quoted if they contain a semicolon (";") or
comma (",") (as these characters are used as delimiters in the header
itself).
This doesn't cover the #link-value however. That must be a URI-Reference as per RFC 3987 which seems to allow this. The link header itself can also have multiple values, from RFC5988 section 5.5:
Link: </TheBook/chapter2>;
rel="previous"; title*=UTF-8'de'letztes%20Kapitel,
</TheBook/chapter4>;
rel="next"; title*=UTF-8'de'n%c3%a4chstes%20Kapitel
I'm parsing this link header in Java using BasicHeaderValueParser from Apache HttpCore 4.4.9 using the following code:
final String linkHeader = "<http://www.example.com/foo,bar.html>; rel=\"canonical\"";
final HeaderElement[] parsedHeaders = BasicHeaderValueParser.parseElements(linkHeader, null);
for (HeaderElement headerElement : parsedHeaders)
{
System.out.println(headerElement);
}
which tokenises on the comma and prints the following:
<http://www.example.com/foo
bar.html>; rel=canonical
Is this valid behaviour?
The comma is of course valid.
What you're missing is that the BasicHeaderValueParser is not generic. It only supports certain HTTP header fields, and "Link" isn't one of them (see syntax description in https://hc.apache.org/httpcomponents-core-ga/httpcore/apidocs/org/apache/http/message/HeaderValueParser.html.
RFC 3986, section 3.3 clearly mentions, that a URI may contain sub-delimiters, which are defined in section 2.2 and may contain a comma ,.
RFC 5988 clearly states that the relation types must be quoted if they contain a comma and not the URI.
I think there is very little room for interpretation and it's IMHO an incomplete implementation on the HttpCore side.
The BasicHeaderValueParser uses the ',' as element delimiter, neglecting the fact that this character is a valid character for the header fields - which is probably ok for most cases, although not 100% compliant.
You may however provide your own custom parser as second argument (instead of null)

Is the `file://` uri prefix something I can hardcode?

I was wondering if I can hardcore the file:// prefix into one of my functions in android.
The function is supposed to determine whether or not the given link points to an external resource on the web, or an internal resource inside the phone itself
public Uri generate_image_uri(String link)
{
// link can be "1DCHiI2.jpg"
// link can be "file://smiley_face.jpg"
if (!link.startsWith("file://")
return Uri.parse("https://i.imgur.com/" + link);
else
return Uri.parse(link);
}
Is this advisable? Or is there a more "fault tolerant" way of getting file://? maybe some function like getProperFilePrefixForThisAndroidVersion();?
In order to clarify my question:
given the following code
(new File(getFilesDir(), "hello_world.jpg")).toString();
Is it safe to assume within reasonable probability that the resulting string will always start with file:// in all current and future Android versions?
Given:
(new File(getFilesDir(), "hello_world.jpg")).toString();
is it safe to assume within reasonable probability that the resulting string will always start with file:// in all current and future Android versions?
No.
According to the javadoc for File on Android, File.toString returns:
"... the pathname string of this abstract pathname. This is just the string returned by the getPath() method."
Not a "file://" URL.
If you want to get a properly formed "file://" URL, do this:
new File(...).toURI().toString()
Now, technically the protocol for a URL (i.e. "file") is case insensitive:
Is the protocol name in URLs case sensitive?
Which means that "FILE://" or "File://" etc are technically valid alternatives.
However, the probability that above expression would ever emit anything other than the lower-case form of the protocol is (um) vanishingly small1.
1 - It would entail monumentally stupid decision making by a number of people. And they are NOT stupid people.

What is the scheme-specific part in a URI?

I can't find any explanation as to what exactly the "scheme-specific part" of a URI is.
From wikipedia :
All URIs and absolute URI references are formed with a scheme name,
followed by a colon character (":"), and the remainder of the URI
called (in the outdated RFCs 1738 and 2396, but not the current STD
66/RFC 3986) the scheme-specific part.
The scheme-specific-part is what you have after the :.
Example :
http://stackoverflow.com/questions/24077453/
scheme : scheme-specific-part
Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme. As such, the URI syntax is a federated and extensible naming system wherein each scheme's specification may further restrict the syntax and semantics of identifiers using that scheme.
See this section of the URI rfc https://www.rfc-editor.org/rfc/rfc3986#section-3.1
Scheme specific means just to simple define which Protocol is used by the Url like
HTTP or HTTPS .
So simply add these in URL to work fine
Scheme Specific
http://localhost:8080/api/notes
Without Scheme
localhost:8080/api/notes

Categories