Is the `file://` uri prefix something I can hardcode?

Is the `file://` uri prefix something I can hardcode? - java

I was wondering if I can hardcore the file:// prefix into one of my functions in android.
The function is supposed to determine whether or not the given link points to an external resource on the web, or an internal resource inside the phone itself
public Uri generate_image_uri(String link)
{
// link can be "1DCHiI2.jpg"
// link can be "file://smiley_face.jpg"
if (!link.startsWith("file://")
return Uri.parse("https://i.imgur.com/" + link);
else
return Uri.parse(link);
}
Is this advisable? Or is there a more "fault tolerant" way of getting file://? maybe some function like getProperFilePrefixForThisAndroidVersion();?
In order to clarify my question:
given the following code
(new File(getFilesDir(), "hello_world.jpg")).toString();
Is it safe to assume within reasonable probability that the resulting string will always start with file:// in all current and future Android versions?

Given:
(new File(getFilesDir(), "hello_world.jpg")).toString();
is it safe to assume within reasonable probability that the resulting string will always start with file:// in all current and future Android versions?
No.
According to the javadoc for File on Android, File.toString returns:
"... the pathname string of this abstract pathname. This is just the string returned by the getPath() method."
Not a "file://" URL.
If you want to get a properly formed "file://" URL, do this:
new File(...).toURI().toString()
Now, technically the protocol for a URL (i.e. "file") is case insensitive:
Is the protocol name in URLs case sensitive?
Which means that "FILE://" or "File://" etc are technically valid alternatives.
However, the probability that above expression would ever emit anything other than the lower-case form of the protocol is (um) vanishingly small1.
1 - It would entail monumentally stupid decision making by a number of people. And they are NOT stupid people.

Related

How do I parse an HttpExchange request after an ending slash?

I have a request as follows:
localhost:8000/location/:01
My code takes as input an HttpContext request.
func(HttpExchange r) {
String area_path = r.getRequestURI(); // Equals string "/location/"
}
How do I parse an HttpExchange correctly so I can pull out the "01" from this path and store it as a variable?

That (localhost:8000/location/:01) is not a valid URL or URI
A plain colon character is not legal in the path of a URL or URI. If you want to put a colon in the path, it must be percent-encoded. Furthermore, if this was a URL, it would start with a protocol; e.g. http:.
Now ... it is unclear what the HTTP stack you are using will do with a syntactically incorrect URL / URI, but it could simply be ignoring the colon and the characters after it.
Your code looks a bit odd too. You have tagged the question as [java]. But the code looks like JavaScript rather than Java; i.e. func is a Javascript keyword. But it also looks like you are using the (deprecated) com.sun.net.httpserver.HttpExchange Java class. I don't know what to make of that ...
My advice:
Don't use a colon character in the URL path.
If you must do it, then percent-encode the colon it.
If you cannot encode it properly, then you may need to find and use a different framework for your HTTP request handling. One that will accept and handle a malformed URL / URI in the way that you want. (Good luck finding one!)
Unfortunately, the details in your question are too sketchy to give more detailed advice.

Java - Retain only the Rest API base URL of a Rest API

Small question regarding how to use java to retain only the base URL of a rest API please.
As input, many strings, all valid rest APIs.
For instance, the inputs:
https://some-host.com/v1/someapi
https://another-host.fr/api/compute
https://somewhere.host.com/public/api/v3/getsomething
I would like to only retain the bold part, basically, the https, the : and the slashes, the host name. Everything that comes after the host, I would like to discard it.
Currently, I am trying some kind of string.split based on the / character, then trying to re-concat the arrays, but I have a feeling I am not going to the right direction.
What would be the most appropriate way please?
Thank you.

You could just try java.net.URL or java.net.URI. They behave pretty similar.
For example:
URL url = new URL("http://example.com/a/b/c");
url.getProtocol();
url.getHost();
url.getPath();
or:
URI uri = new URI("http://example.com/a/b/c");
uri.getScheme();
uri.getHost();
uri.getPath();
There are several methods in both classes to extract lot's of different parts.

Servlet Real Path

I am running a webapp under the directory blog. (e.g. www.example.com/blog).
I would like to get the real filesystem path of a request.
e.g. www.example.com/blog/test-file.html ->
/usr/share/tomcat7/webapps/blog/test-file.html
I tried the following:
public String realPath(HttpServletRequest request, ServletContext servletContext){
String requestURI = request.getRequestURI();
String realPath = servletContext.getRealPath(requestURI);
return realPath;
}
However this returns
/usr/share/tomcat7/webapps/blog/blog/test-file.html
What is the correct way to do this?

Short answer
To get the result you want, use HttpServletRequest#getServletPath() method as an argument to getRealPath() method.
This is the closest to what you want to accomplish (read the note below).
Explanation
The reason you're getting such path (with double blog) is that you're using the result returned by the getRequestURI() method.
The getRequestURI() method returns the path starting with application context. In your case it will be:
/blog/test-file.html
What happens then, the getRealPath() method appends the string returned by getRequestURI() method to the real/physical path to the folder, where you application resides on the file system, which in your case is:
/usr/share/tomcat7/webapps/blog/
So the resulting path is:
/usr/share/tomcat7/webapps/blog/blog/test-file.html
That is the reason of your double blog issue.
IMPORTANT NOTE
DISCLAIMER
Maybe the OP is already aware of the information written below, but it is written for the sake of completeness.
The real path you are trying to get does not mean you are getting, well, the real path on your file system. The url-pattern configured in web.xml (or if you're using Servlet 3.0+ in the related annotation) is actually a logical/virtual path, which may or may not relate to the actual, physical path on a file system, i.e. the patterns (paths) specified does not need to exist physically.
Also quote from the ServletContext.getRealPath(String) documentation (emphasis mine):
Gets the real path corresponding to the given virtual path.

Java: How to easily check if a URL was already shortened?

If I have a general url (not restricted to twitter or google) like this:
http://t.co/y4o14bI
is there an easy way to check if this url is shortened?
In the above case, I as a human can of course see that it was shortend, but is there an automatic and elegant way?

You could do a request to the URL, look if you get redirected and if so, assume it's a shortening service. For this you'd have to read the HTTP status codes.
On the other hand, you could whitelist some URL shortening services (t.co, bit.ly, and so on) and assume all links to those domains are shortened.
Drawback of the first method is that it isn't certain, some sites use redirects internally. The drawback of the second method is that you'd have to keep adding shortening services, although only a few are used widely.

One signal may be to request the URL and see if it results in a redirect to another domain. However, without a good definition of what "shortened" means, there is no generic way.

if you know all the domains that can be used to shorten your URLs, check if it is contained :
String[] domains = {"bit.ly", "t.co"...};
for(String domain : domains){
if(url.startsWith("http://" + domain)){
return true;
}
}
return false;

You can't: You will have to work by assumption.
Assumption:
Does www exist in url.
Does the server name end with a valid domain (e.g. com, edu, etc.) or does it has co.xx where xx is a valid country or organization code.
And you can add more assumption based on other url shortening links.

You can't.
You can only check if you list a couple of shorteners and check if the url starts with it.
You can also try checking whether the url is shorter than a given length (and contains path/query string), but some shorteners (tinyurl for example) may have longer urls than normal sites (aol.com)
I would prefer the list of known shorteners.

Here's what you could do in Java, groovy and the like.
Get the url you want to test;
Open the url with HttpURLConnection
Check the response code
if it is a valid code, 200 for example, the you can retrieve the url string in long form from the connection object if it was shortened or back in its original form if it wasn't.
We all love to see some code don't we. Its crude, but hey!
String addr = "http://t.co/y4o14bI";
URL url = new URL(addr);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
if (connection.getResponseCode() == 200) {
String longUrl = connection.url;
System.out.println(longUrl);
} else {
// You decide what you want to do here!
}

Actually, you as a human, can't. The only way you know that it's shortened is that it's a t.co domain. The y4o14bI could be an CMS identifier for all you know.
The best way would be to use a list of known shortener urls, and lookup against that.
And even then you would have problems. I use bit.ly with a personal domain, wtn.gd
So http://wtn.gd/random would also be a shortened URL.
You could maybe do a HTTP HEAD-request, and check for a 301/302 ?

If you request an URL like this, your HttpCLient should receive a HTTP Redirect instead of a HTML page. This wouldn't be an evidence but at least a hint.

Evaluate the URL and look for some clues:
the Path meets certain criteria
only has one step (i.e. not multiple slashes)
does not end with filename extensions
not longer than X characters (would need to evaluate various URL shortening services and adjust the upper bounds for the max token length)
HttpUrlConnection returns a redirect responseCode (i.e. 301, 302)

I would suggest using android.util.Patterns.WEB_URL
public static List<String> findUrls(String input) {
List<String> links = new ArrayList<>();
Matcher m = android.util.Patterns.WEB_URL.matcher(input);
while (m.find()) {
String url = m.group();
links.add(url);
}
return links;
}

Use the unshorten URL service like https://unshorten.me
They have an API as well https://unshorten.me/api
If the URL is shortened it will return the original URL.
If not you will get the same one back.

Java : File.toURI().toURL() on Windows file

The system I'm running on is Windows XP, with JRE 1.6.
I do this :
public static void main(String[] args) {
try {
System.out.println(new File("C:\\test a.xml").toURI().toURL());
} catch (Exception e) {
e.printStackTrace();
}
}
and I get this : file:/C:/test%20a.xml
How come the given URL doesn't have two slashes before the C: ? I expected file://C:.... Is it normal behaviour?
EDIT :
From Java source code : java.net.URLStreamHandler.toExternalForm(URL)
result.append(":");
if (u.getAuthority() != null && u.getAuthority().length() > 0) {
result.append("//");
result.append(u.getAuthority());
}
It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol?

That's an interesting question.
First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.
RFC2396 does not actually require two slashes. According to section 3:
The URI syntax is dependent upon the
scheme. In general, absolute URI are
written as follows:
<scheme>:<scheme-specific-part>
Having said that, RFC2396 has been superseded by RFC3986, which states
The generic URI syntax consists of a
hierarchical sequence of omponents
referred to as the scheme, authority,
path, query, and fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
The scheme and path components are
required, though the path may be empty
(no characters). When authority is
present, the path must either be empty
or begin with a slash ("/") character.
When authority is not present, the
path cannot begin with two slash
characters ("//"). These restrictions
result in five different ABNF rules
for a path (Section 3.3), only one of
which will match any given URI
reference.
So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.
However, that RFC didn't come around until 2005, and Java references RFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.

To answer why you can have both:
file:/path/file
file:///path/file
file://localhost/path/file
RFC3986 (3.2.2. Host) states:
"If the URI scheme defines a default for host, then that default applies when the host subcomponent is undefined or when the registered name is empty (zero length). For example, the "file" URI scheme is defined so that no authority, an empty host, and "localhost" all mean the end-user's machine, whereas the "http" scheme considers a missing authority or empty host invalid."
So the "file" scheme translates file:///path/file to have a context of the end-user's machine even though the authority is an empty host.

As far as using it in a browser is concerned, it doesn't matter. I have typically seen file:///... but one, two or three '/' will all work. This makes me think (without looking at the java documentation) that it would be normal behavior.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is the `file://` uri prefix something I can hardcode? - java

Related

How do I parse an HttpExchange request after an ending slash?

Java - Retain only the Rest API base URL of a Rest API

Servlet Real Path

Java: How to easily check if a URL was already shortened?

Java : File.toURI().toURL() on Windows file

Categories

Resources