Hadoop WebHDFS Java Client API enable SSL and Basic Authentication - java

I have a Spring Boot application that uses spring-yarn-boot:2.2.0.RELEASE to get access to a Hadoop filesystem (HDFS). Operations that I do are LISTSTATUS, GETFILESTATUS and OPEN (to read a file). HDFS URI is specified through application.properties:
spring.hadoop.fsUri=webhdfs://127.0.0.1:50070/webhdfs/v1/
I make a bean to which I provide Hadoop Configuration (that Spring somehow automagically prepares for me on startup):
SimplerFileSystem fs = new SimplerFileSystem(FileSystem.get(configuration));
FsShell shell = new FsShell(configuration);
And everything works well as expected, but the problems came when I got two new requirements.
First thing is that HDFS will be protected with SSL from now on. I can't seem to find any way to tell my application that the fsURI that starts with webhdfs:// is actually a https connection. And if I will give the https URL directly, I'll get an exception:
java.io.IOException: No FileSystem for scheme: https
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
... which is caused by that code: FileSystem.get(configuration).
This thing is driving me crazy, I don't seem to find a way to get pass this.
Second requirement is, that I need to authenticate myself against the WebHDFS with basic authentication. For this I also can't find any means in the client API.
Has anyone done it before and have any instructions to share? Or maybe anyone knows a different client API that I can use to accomplish this?
One option is to implement the REST calls myself with RestTemplate or any other REST service consumer API, but this looks like not-so-special use case so I'm really hoping that there is something that has been done already.
EDIT:
Found a solution to the HTTPS problem. One should use swebhdfs:// as url prefix and everything will work. Still havent found a solution to the Basic Auth problem.

Related

Spring boot Hadoop, Webhdfs and Apache Knox

I have a Spring boot application which is accessing HDFS through Webhdfs secured via Apache Knox secured by Kerberos. I created my own KnoxWebHdfsFileSystem with custom scheme (swebhdfsknox) as a subclass of WebHdfsFilesystem which only changes the URLs to contain the Knox proxy prefix. So it effectively remaps requests from form:
http://host:port/webhdfs/v1/...
to the Knox one:
http://host:port/gateway/default/webhdfs/v1/...
I do this by overriding two methods:
public URI getUri()
URL toUrl(Op op, Path fspath, Param<?, ?>... parameters)
So far so good. I let spring boot create FsShell for me and use it for various operations such as list files, mkdir etc. All work fine. Except copyFromLocal which as documented requires 2 steps and redirect. And on the last step when the filesystem tries to PUT to the final URL which received in Location header it fails with error:
org.apache.hadoop.security.AccessControlException: Authentication required
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:334) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$FsPathOutputStreamRunner$1.close(WebHdfsFileSystem.java:787) ~[hadoop-hdfs-2.6.0.jar:na]
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:302) ~[hadoop-common-2.6.0.jar:na]
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1889) ~[hadoop-common-2.6.0.jar:na]
at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:265) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
at org.springframework.data.hadoop.fs.FsShell.copyFromLocal(FsShell.java:254) ~[spring-data-hadoop-core-2.2.0.RELEASE.jar:2.2.0.RELEASE]
I suspect the problem is the redirect somehow but can't figure out what might be the problem here. If I do the same requests via curl the file is successfully uploaded to HDFS.
This is a known issue with using existing Hadoop clients against Apache Knox using the HadoopAuth provider for kerberos on Knox. If you were to use curl or some other REST client it would likely work for you. The existing Hadoop java client doesn't expect a SPNEGO challenge from the DataNode - which is what the PUT in the send step is talking to. The DataNode expects the block access token/delegation token issued by the NameNode in the first step to be present. The Knox gateway however will require SPNEGO authentication for every request to that topology.
This is an issue that is on the roadmap to be addressed and will likely become hotter with interest moving more inside the cluster rather than only accessing resources through it from the outside.
The following JIRA tracks this item and as you can see from the title is related to DistCp which is a similar usecase:
https://issues.apache.org/jira/browse/KNOX-482
Feel free to take a look and lend a hand with testing or developing - it would all be most welcome!
Another possibility would be to change the Hadoop java client to deal with a SPNEGO challenge for the DataNode as well.

Hadoop Java Client API messes up my fsURI

I try to access HDFS in Hadoop Sandbox with the help of Java API from a Spring Boot application. To specify the URI to access the filesystem by I use a configuration parameter spring.hadoop.fsUri. HDFS itself is protected by Apache Knox (which to me should act just as a proxy that handles authentication). So if I call the proxy URI with curl, I use the exact same semantics as I would use without Apache Knox. Example:
curl -k -u guest:guest-password https://sandbox.hortonworks.com:8443/gateway/knox_sample/webhdfs/v1?op=GETFILESTATUS
Problem is that I can't access this gateway using the Hadoop client library. Root URL in the configuration parameter is:
spring.hadoop.fsUri=swebhdfs://sandbox.hortonworks.com:8443/gateway/knox_sample/webhdfs/v1
All the requests get Error 404 and the problem why is visible from the logs:
2015-11-19 16:42:15.058 TRACE 26476 --- [nio-8090-exec-9] o.a.hadoop.hdfs.web.WebHdfsFileSystem : url=https://sandbox.hortonworks.com:8443/webhdfs/v1/?op=GETFILESTATUS&user.name=tarmo
It destroys my originally provided fsURI. If I debugged what happens in the internals of Hadoop API, I see that it takes only the domain part sandbox.hortonworks.com:8443 and appends /webhdfs/v1/ to it from a constant. So whatever my original URI is, at the end it will be https://my-provided-hostname/webhdfs/v1. I understand that it might have something to do with the swebhdfs:// beginning but I can't use https:// directly because in that case an exception will be thrown how there is no such filesystem as https.
Googling this, I found an old mailing list thread where someone had the same problem, but no one ever answered the poster.
Does anyone know what can be done to solve this problem?
I apologize for being so late in this response.
You may be able to leverage the Apache Knox Default Topology URL. In your description, you happen to be using a topology called knox_sample. In order to access that topology as the "Default Topology", you would have to configure it as the default topology name. See: http://knox.apache.org/books/knox-0-7-0/user-guide.html#Default+Topology+URLs
The default "Default Topology" name is sandbox

"javax.xml.ws.WebServiceException: is not a valid service." proxy issue?

As a premise, I am not very experienced yet, but I have tried to read and search everything I possibly could, related to this topic, and still no luck.
I was given a simple client to call a webservice but once it was fully setup (which included the use of a certificate and a couple more properties to set) I got the error mentioned in the title:
javax.xml.ws.WebServiceException: {http://http://cert.controller.portaapplicativa.ictechnology.it//}MyService is not a valid service. Valid services are:
at com.sun.xml.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:187)
at com.sun.xml.ws.client.WSServiceDelegate.<init>(WSServiceDelegate.java:159)
at com.sun.xml.ws.spi.ProviderImpl.createServiceDelegate(ProviderImpl.java:82)
at javax.xml.ws.Service.<init>(Service.java:56)
at package.client.wsimport.MyService..<init>(MyService.java:46)
at package.client.Client.doRicercaDEN(Client.java:55)
at package.client.Client.main(Client.java:36)
I tried generating the client again with JAX-WS:
java -classpath C:\Programmi\Java\jdk1.6.0_38\lib\tools.jar com.sun.tools.internal.ws.WsImport -verbose C:\WsdlFile.wsdl -p package.client.wsimport -s C:\tmp\ws\
And I get the same issue. I am using a local copy of the wsdl because wsimport doesn't seem to like the certificate I'm trying to set in the properties (I'm most likely doing something wrong, but I opted for the simple workaround, given I have more pressing issues).
Trying to use SoapUI to test the service, everything works fine, though I need to set the preferences for the proxy to "None".
So I tried to make sure the connection doesn't use any proxy in my client as well:
(...)
systemSettings.remove("http.proxyHost");
systemSettings.remove("http.proxyPort");
systemSettings.remove("https.proxyHost");
systemSettings.remove("https.proxyPort");
System.setProperty("http.nonProxyHosts","*");
System.setProperty("https.nonProxyHosts","*");
(BTW, before "*", which as I understand it should work as a wildcard for "every domain", I have tried specifying the specific domains as well)
Anyway, the result is always the same.
Is there something I am doing wrong, something left to try?
I doubt this is a proxy issue. If you can share the code you are using to create the Service object it might help.
As a kick start try reading the below thread Is not a valid service exception in JAX-WS
What I think is that the QName you have provided when creating the Service is not proper. To get the correct QName you might try to open the generated stub.
As it turns out, what I was missing was importing the certificate in my local truststore (or better, when I first tried doing so, I thought I was using the correct truststore, but I wasn't).
For anyone who may need it, here is an explanation of how to do that using keytool: http://javarevisited.blogspot.it/2012/03/add-list-certficates-java-keystore.html
Another option is to use specific GUI like Portecle.

How to secure a REST web service in Java EE 6

I have made a web application using Java EE 6 (using reference implementations) and I want to expose it as a REST web service.
The background is that I want to be able to retrieve data from the web application to a iOS app I made. The question is how would I secure the application? I only want my application to use the web service. Is that possible and how would I do this? I only need to know what I should search for and read and not the actual code.
Unfortunately, your webservice will never be completely secure but here are few of the basic things you can do:
Use SSL
Wrap all your (app) outbound payloads in POST requests. This will prevent casual snooping to find out how your webservice works (in order to reverse engineer the protocol).
Somehow validate your app's users. Ideally this will involve OAUTH for example using Google credentials, but you get the idea.
Now I'm going to point out why this won't be completely secure:
If someone gets a hold of your app and reverse engineers it, everything you just did is out the window. The only thing that will hold is your user validation.
Embedding a client certificate (as other people have pointed out) does nothing to help you in this scenario. If I just reverse enginneered your app, I also have your client certificate.
What can you do?
Validate the accounts on your backend and monitor them for anomalous usage.
Of course this all goes out the window when someone comes along, reverse engineers your app, builds another one to mimic it, and you wouldn't (generally) know any better. These are all just points to keep in mind.
Edit: Also, if it wasn't already obvious, use POST (or GET) requests for all app queries (to your server). This, combined with the SSL should thwart your casual snoopers.
Edit2: Seems as if I'm wrong re: POST being more secure than GET. This answer was quite useful in pointing that out. So I suppose you can use GET or POST interchangeably here.
Depends on how secure you want to make it.
If you don't really care, just embed a secret word in your application and include in all the requests.
If you care a little more do the above and only expose the service via https.
If you want it to be secure, issue a client certificate to your app and require a
valid client certificate to be present when the service is accessed.
my suggestions are:
use https instead of http. there are free ssl certificate avaliable,
get one and install.
use a complex path such as 4324234AA_fdfsaf/ as the root end point.
due to the nature of http protocol, the path part is encrypted in the https request. therefore it's very safe. there are ways to decrypt the request through man-in-the-middle attack but it requires full control over the client device including install an ilegal ssl certificate. but, i'd spend more time on my app to make it successful.
Create a rule on the machine which hosts your Web Service to only allow your application to access it through some port. In Amazon EC2, this is done creating a rule in the instance Security Group.
We have used RestEasy as a part to securing our exposed RESTful webservices. There should be lot of example out there but here is the one which might get you started.
http://howtodoinjava.com/2013/06/26/jax-rs-resteasy-basic-authentication-and-authorization-tutorial/
You can also use OAUTH:
http://oltu.apache.org/index.html

Uploading to Youtube via a proxy using the Java Youtube API

So I want to write a servlet which uploads a video to a youtube channel using the Java API, but I can't seem to find a way of specifying that I want to go through a proxy server. I've seen an example on this site where someone managed to do this using C#, but the Classes they used don't seem to exist in the Java API. Has anybody managed to successfully do this?
YouTubeService service = new YouTubeService(clientID, developerKey);
I'm new here so I'm unable to comment on posts (and a little late on this topic), but Jesper, I believe this is the C# sample that the original poster was talking about: How to upload to YouTube using the API via a Proxy Server
I can see no "direct" way of porting that example to Java though, since the GDataRequestFactory doesn't seem to have any proxy-related fields.
I was also having issues with the Java client Library with proxy in our application. Basically, the library picks up the global Java proxy settings:
System.getProperty("http.proxyHost");
System.getProperty("http.proxyPort");
but for some reason not everywhere. To be more precise, even with a proxy server properly configured in Java, YouTube authentication (calling service.setUserCredentials("login", "pwd")) would use a direct connection and ignore the proxy. But a video upload (calling service.insert(...)) would use the proxy correctly.
With the help of folks at the official YouTube API mailing list, I was able to nail this down. The issue is that the authentication is performed using SSL (HTTPS) and since there is a different set of properties for the HTTPS proxy, this didn't work. The fix is to simply set https.proxy* properties as well (in addition to http.proxy*), so that these point to a valid proxy server too:
System.getProperty("https.proxyHost");
System.getProperty("https.proxyPort");

Categories