Hadoop Java Client API messes up my fsURI - java

I try to access HDFS in Hadoop Sandbox with the help of Java API from a Spring Boot application. To specify the URI to access the filesystem by I use a configuration parameter spring.hadoop.fsUri. HDFS itself is protected by Apache Knox (which to me should act just as a proxy that handles authentication). So if I call the proxy URI with curl, I use the exact same semantics as I would use without Apache Knox. Example:
curl -k -u guest:guest-password https://sandbox.hortonworks.com:8443/gateway/knox_sample/webhdfs/v1?op=GETFILESTATUS
Problem is that I can't access this gateway using the Hadoop client library. Root URL in the configuration parameter is:
spring.hadoop.fsUri=swebhdfs://sandbox.hortonworks.com:8443/gateway/knox_sample/webhdfs/v1
All the requests get Error 404 and the problem why is visible from the logs:
2015-11-19 16:42:15.058 TRACE 26476 --- [nio-8090-exec-9] o.a.hadoop.hdfs.web.WebHdfsFileSystem : url=https://sandbox.hortonworks.com:8443/webhdfs/v1/?op=GETFILESTATUS&user.name=tarmo
It destroys my originally provided fsURI. If I debugged what happens in the internals of Hadoop API, I see that it takes only the domain part sandbox.hortonworks.com:8443 and appends /webhdfs/v1/ to it from a constant. So whatever my original URI is, at the end it will be https://my-provided-hostname/webhdfs/v1. I understand that it might have something to do with the swebhdfs:// beginning but I can't use https:// directly because in that case an exception will be thrown how there is no such filesystem as https.
Googling this, I found an old mailing list thread where someone had the same problem, but no one ever answered the poster.
Does anyone know what can be done to solve this problem?

I apologize for being so late in this response.
You may be able to leverage the Apache Knox Default Topology URL. In your description, you happen to be using a topology called knox_sample. In order to access that topology as the "Default Topology", you would have to configure it as the default topology name. See: http://knox.apache.org/books/knox-0-7-0/user-guide.html#Default+Topology+URLs
The default "Default Topology" name is sandbox

Related

Why is GetServerAuthCodeResult Deprecated? How can I do something equivalent in an Installed Application?

Following this post: http://android-developers.blogspot.com/2016/01/play-games-permissions-are-changing-in.html I have obtained a single use authorization code for use on my backend server as follows:
import com.google.android.gms.games.Games;
//later
Games.GetServerAuthCodeResult result = Games.getGamesServerAuthCode(gameHelper.getApiClient(), server_client_id).await();
if (result.getStatus().isSuccess()) {
String authCode = result.getCode();
// Send code to server...
This seems to works fine, but it presents a question:
1) getGamesServerAuthCode and GetServerAuthCodeResult are marked as deprecated. Why? Should I be using something else instead?
2) How would I do something equivalent in an non-Android installed Java application? I am able to obtain a token on the client application, but I also need to obtain a single use code to pass to my backend server like above. I can't find an equivalent function to get a Server Auth Code. (using com.google.api.client.extensions.java6.auth.oauth2)
I am basically trying to follow this flow: https://developers.google.com/games/services/web/serverlogin but in Java, NOT Javascript. I am attempting to do this in an Android app and a desktop Java app.
1) Yes, in Android use GetServerAuthCodeResult although it is still marked as deprecated. It is the recommended way from Google and it seems they have only forgot to remove the deprecation annotation when releasing to general public.
2) For desktop applications you can follow the instructions here: https://developers.google.com/identity/protocols/OAuth2InstalledApp
Basically from your app you open the system browser (embedded webviews are discouraged) and make a https request to the https://accounts.google.com/o/oauth2/v2/auth endpoint. In the request you supply a local redirect URI parameter i.e. http://127.0.0.1:9004 (you should query your platform for the relevant loopback IP, and start a HTTP listener on a random available port). The authorization code will be sent to your local HTTP listener when the user has given consent or an error such as error=access_denied if the user declined the request. Your application must be listening on this local web server to retrieve the response with the authcode. You also have the option to redirect to a server URI directly claimed by your app, see docs on link above. When your app receives the authorization response, for best usability, it should respond with an HTML page, instructing the user to close the browser tab and return to your app. Also, if you want the Games-scope make sure you are using the https://www.googleapis.com/auth/games as scope in the request, example below, with line breaks and spaces for readability.
https://accounts.google.com/o/oauth2/v2/auth?
scope=https://www.googleapis.com/auth/games&
redirect_uri=http://127.0.0.1:9004&
response_type=code&
client_id=812741506391-h38jh0j4fv0ce1krdkiq0hfvt6n5amrf.apps.googleusercontent.com
Please note that I think you'll have to create and link an app of type other, in the Google Play Developer Console linked-app, for the localhost redirection to work. Use type Web if you plan to redirect to server URI directly, add your server URI to Authorized redirect URIs in the API Manager under section Credentials.
Browser screenshot:
There is finally a proper answer to part 1) of this question!
In the release notes of gms 10.2.0
https://developers.google.com/android/guides/releases#february_2017_-_v102
the new method of obtaining a server code is described. A good example of how to do this is provided here:
https://github.com/playgameservices/clientserverskeleton
I ended up updating Google's baseGameUtils to follow the example above.
Still not sure the proper way to do this for part 2) of the question, at the moment I am sending the token to the server which works but is probably unsafe.

Spring boot Hadoop, Webhdfs and Apache Knox

I have a Spring boot application which is accessing HDFS through Webhdfs secured via Apache Knox secured by Kerberos. I created my own KnoxWebHdfsFileSystem with custom scheme (swebhdfsknox) as a subclass of WebHdfsFilesystem which only changes the URLs to contain the Knox proxy prefix. So it effectively remaps requests from form:
http://host:port/webhdfs/v1/...
to the Knox one:
http://host:port/gateway/default/webhdfs/v1/...
I do this by overriding two methods:
public URI getUri()
URL toUrl(Op op, Path fspath, Param<?, ?>... parameters)
So far so good. I let spring boot create FsShell for me and use it for various operations such as list files, mkdir etc. All work fine. Except copyFromLocal which as documented requires 2 steps and redirect. And on the last step when the filesystem tries to PUT to the final URL which received in Location header it fails with error:
org.apache.hadoop.security.AccessControlException: Authentication required
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:334) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:787) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:302) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1889) ~[hadoop-common-2.6.0.jar:na]
at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:265) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:254) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
I suspect the problem is the redirect somehow but can't figure out what might be the problem here. If I do the same requests via curl the file is successfully uploaded to HDFS.
This is a known issue with using existing Hadoop clients against Apache Knox using the HadoopAuth provider for kerberos on Knox. If you were to use curl or some other REST client it would likely work for you. The existing Hadoop java client doesn't expect a SPNEGO challenge from the DataNode - which is what the PUT in the send step is talking to. The DataNode expects the block access token/delegation token issued by the NameNode in the first step to be present. The Knox gateway however will require SPNEGO authentication for every request to that topology.
This is an issue that is on the roadmap to be addressed and will likely become hotter with interest moving more inside the cluster rather than only accessing resources through it from the outside.
The following JIRA tracks this item and as you can see from the title is related to DistCp which is a similar usecase:
https://issues.apache.org/jira/browse/KNOX-482
Feel free to take a look and lend a hand with testing or developing - it would all be most welcome!
Another possibility would be to change the Hadoop java client to deal with a SPNEGO challenge for the DataNode as well.

Hadoop WebHDFS Java Client API enable SSL and Basic Authentication

I have a Spring Boot application that uses spring-yarn-boot:2.2.0.RELEASE to get access to a Hadoop filesystem (HDFS). Operations that I do are LISTSTATUS, GETFILESTATUS and OPEN (to read a file). HDFS URI is specified through application.properties:
spring.hadoop.fsUri=webhdfs://127.0.0.1:50070/webhdfs/v1/
I make a bean to which I provide Hadoop Configuration (that Spring somehow automagically prepares for me on startup):
SimplerFileSystem fs = new SimplerFileSystem(FileSystem.get(configuration));
FsShell shell = new FsShell(configuration);
And everything works well as expected, but the problems came when I got two new requirements.
First thing is that HDFS will be protected with SSL from now on. I can't seem to find any way to tell my application that the fsURI that starts with webhdfs:// is actually a https connection. And if I will give the https URL directly, I'll get an exception:
java.io.IOException: No FileSystem for scheme: https
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
... which is caused by that code: FileSystem.get(configuration).
This thing is driving me crazy, I don't seem to find a way to get pass this.
Second requirement is, that I need to authenticate myself against the WebHDFS with basic authentication. For this I also can't find any means in the client API.
Has anyone done it before and have any instructions to share? Or maybe anyone knows a different client API that I can use to accomplish this?
One option is to implement the REST calls myself with RestTemplate or any other REST service consumer API, but this looks like not-so-special use case so I'm really hoping that there is something that has been done already.
EDIT:
Found a solution to the HTTPS problem. One should use swebhdfs:// as url prefix and everything will work. Still havent found a solution to the Basic Auth problem.

NRPE Protocol description

I need to query a nrpe nagios server from a Java application remotely just as check_nrpe would do:
check_nrpe -H 192.***.***.*** -p 56** -c "check_load"
When I say "from a Java application" I mean I want the results to be received and processed at my Java application. The first idea I had was to call the "check_nrpe" command from my application and retrieve its output and return value but I would like more a standalone solution where no external programs are called.
I don't need to wait for state changes, just eventually check the monitor state. Since I have been unable to locate any Java library (should I try JNRPE?), I would like to implement the protocol check_nrpe and nrpe daemon use to communicate.
Have any of you tried this before? In that case, do you have a description of this protocol?
If your answers are negative I will try to analize the protocol using whireshark but any clue will be much appreciated.
An explanation of NRPE protocol from Andreas Marschke blog, The NRPE Protocol explained (on gitHub too)
Anyway, JNRPE have a full working implementation of the protocol, you can download jcheck_nrpe-2.0.3-RC5 source code and take a look at jcheck_nrpe-2.0.3-RC5\src\main\java\it\jnrpe\client\JNRPEClient.java class for a sample client who's using jnrpe-lib-1.0.1-RC5.
jnrpe-lib have two concrete classes which implements the protocol request and response
JNRPERequest.java
JNRPEResponse.java
The full protocol implementation classes can be found at jnrpe-lib-1.0.1-RC5\src\main\java\it\jnrpe\net\ folder

servlets behind a proxy: getting un-proxied URL

Is there anything in the Servlet spec, Tomcat, or Wicket that will allow a webapp running behind mod_proxy to determine the non-proxied URL of the request?
We need to send out emails with links in them. I had been using the following bit of Wicket to construct URLs to specific pages in the app:
String relURL = RequestCycle.get().getRequest().getRelativePathPrefixToWicketHandler();
RequestUtils.toAbsolutePath(relURL);
Since the emails don't go back out through the proxy, of course the URLs don't get re-written, and end up looking like http://localhost/....
Right now the best I can do is to hard-code the URLs to our production server, but that's setting us up for some debugging headaches when running on dev/test machines.
Using InetAddress.getLocalHost().getHostName() isn't really a solution, since that's likely to return prod1.mydomain.com or somesuch, rather than mydomain.dom, from which the request likely originated.
As answered for the question Retain original request URL on mod_proxy redirect:
If you're running Apache >= 2.0.31 then you might try to set the
ProxyPreserveHost directive as described here .
This should pass the original Host header trough mod_proxy into your
application, and normally the request URL will be rebuild there (in
your Servlet container) using the Host header, so the schema location
should be build using the host and path infos from "before" the proxy.
Is there anything in the Servlet spec, Tomcat, or Wicket that will allow a webapp running behind mod_proxy to determine the non-proxied URL of the request?
No. If the reverse proxy doesn't put the information that you require into the message headers before passing them on, there's no way to recover it.
You need to look at the Apache Httpd documentation to figure out how to get the front-end to put the information that you need into the HTTP request headers on the way through. (It can be done. I just can't recall the details.)

Categories