HDFS access from remote host through Java API, user authentication

HDFS access from remote host through Java API, user authentication - java

I need to use HDFS cluster from remote desktop through Java API. Everything works OK until it comes to write access. If I'm trying to create any file I receive access permission exception. Path looks good but exception indicates my remote desktop user name which is of course is not what I need to access needed HDFS directory.
The question is:
- Is there any way to represent different user name using 'simple' authentication in Java API?
- Could you please point some good explanation of authentication / authorization schemes in hadoop / HDFS preferable with Java API examples?
Yes, I already know 'whoami' could be overloaded in this case using shell alias but I prefer to avoid solutions like this. Also specifics here is I dislike usage of some tricks like pipes through SSH and scripts. I'd like to perform everything using just Java API.
Thank you in advance.

After some studying I came to the following solution:
I don't actually need the full Kerberos solution, it is enough currently that clients can run HDFS requests from any user. Environment itself is considered secure.
This gives me solution based on hadoop UserGroupInformation class. In future I can extend it to support Kerberos.
Sample code probably useful for people both for 'fake authentication' and remote HDFS access:
package org.myorg;
import java.security.PrivilegedExceptionAction;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
public class HdfsTest {
public static void main(String args[]) {
try {
UserGroupInformation ugi
= UserGroupInformation.createRemoteUser("hbase");
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://1.2.3.4:8020/user/hbase");
conf.set("hadoop.job.ugi", "hbase");
FileSystem fs = FileSystem.get(conf);
fs.createNewFile(new Path("/user/hbase/test"));
FileStatus[] status = fs.listStatus(new Path("/user/hbase"));
for(int i=0;i<status.length;i++){
System.out.println(status[i].getPath());
}
return null;
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
}
Useful reference for those who have a similar problem:
Cloudera blog post "Authorization and Authentication In Hadoop". Short, focused on simple explanation of hadoop security approaches. No information specific to Java API solution but good for basic understanding of the problem.
UPDATE:
Alternative for those who uses command line hdfs or hadoop utility without local user needed:
HADOOP_USER_NAME=hdfs hdfs fs -put /root/MyHadoop/file1.txt /
What you actually do is you read local file in accordance to your local permissions but when placing file on HDFS you are authenticated like user hdfs.
This has pretty similar properties to API code illustrated:
You don't need sudo.
You don't need actually appropriate local user 'hdfs'.
You don't need to copy anything or change permissions because of previous points.

Related

How to fix "org.apache.shiro.UnavailableSecurityManagerException" error with "SecurityUtils.getSubject()"

I try to read data from web-server, which situated at some URL.
In our company I use e-Commerce API, which works with the data from web-server.
To retrieve the data at first I need to make pool of the data from web-server.
To make the pool of data I need to configure connection.
One part from step "I need to configure connection" is function getSession(), which uses the Shiro api(org.apache.shiro.SecurityUtils)
so every time when I try to make connection with web-server and use the data from the web-server I become an Exception "Exception in thread "main" org.apache.shiro.UnavailableSecurityManagerException: No SecurityManager accessible to the calling code, either bound to the org.apache.shiro.util.ThreadContext or as a vm static singleton. This is an invalid application configuration."
Before writing this question I tried to look at logs and read about classes and problem, which describe there.
This all runs on Windows 10, Java 8, Payara Server(Glassfish), with EJBAPI and some e-Commerce API.
Import that I use
import org.apache.shiro.SecurityUtils;
import org.apache.shiro.session.InvalidSessionException;
import org.apache.shiro.session.Session;
import org.apache.shiro.subject.Subject;
at
ContentConfiguration conf = new ContentConfiguration(
getSessionId(),
Constant.ENTITYMODELL,
Constant.EMPTY,
context);
protected static Session getSession()
{
Subject subject = SecurityUtils.getSubject();
if(subject.isAuthenticated())
return subject.getSession();
else
return null;
}
Error Message
Exception in thread "main" org.apache.shiro.UnavailableSecurityManagerException: No SecurityManager accessible to the calling code, either bound to the org.apache.shiro.util.ThreadContext or as a vm static singleton. This is an invalid application configuration.
at org.apache.shiro.SecurityUtils.getSecurityManager(SecurityUtils.java:123)
at org.apache.shiro.subject.Subject$Builder.<init>(Subject.java:627)
at org.apache.shiro.SecurityUtils.getSubject(SecurityUtils.java:56)
at de.kolb.demo.data.ServiceLocator.getSessionId(ServiceLocator.java:15)
at de.kolb.demo.logic.CommonTest.getCommonData(CommonTest.java:32)
at de.kolb.demo.presentation.ContentDirector.main(ContentDirector.java:34)

I want to get answer which represent situation in my company, but a lot of principes, which i will to describe connect with common proplem in Shiro hub.
1.My problem connect with getSessionId() enter image description here
2.getSessionId() it's a function from company API. In this function i call org.apache.shiro.SecurityUtils.getSubject() enter
link description here
At this moment I thought about exception Message No SecurityManager accessible to the calling code.... Than I look at Shiro documentation here enter link description here. There i found, that every time when i use Aplication, which use authentication and autorisation i need to configure "SecurityManager" with "Realm" object.
4.It is a small instruction and detailed instruction present at shiro site enter link description here

Accessing Keberized Hadoop cluster programmatically

We are trying to access the kerberized Hadoop cluster(Cloudera distribution) using code(java) but getting the below exception.
Caused by: javax.security.auth.login.LoginException: Unable to obtain
password from user at
com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
at
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5Login
Module.java:760) at
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
We have used the property "hadoop.security.authentication" as kerberos,fs.defaultFS as hdfs://devha:8020 and passed the keytabfilepath in the userinformationgroup.

First, read the comments on your question. Good stuff.
Taking a step back since that information can be overwhelming there's two possible ways to authenticate to a Hadoop cluster. A user will normally use a username (principal) and password. An application will normally use a principal and a keytab file. A keytab file is created by the Kerberos administrator using the 'kadmin' application.
Furthermore there's the concept of a "Login" user - an application wide default, or a "Current" user that could be specific to your current need. You'll often use the former to access resources on your local cluster and the latter to access resources on an external cluster.
Since I'm using the latter I can give you a quick code snippet to get you started. For initialization:
UserGroupInformation.setConfiguration(configuration);
where "configuration" is either read from the standard location (/etc/hadoop) or generated on the fly. Note - that sets a static value so you need to be very careful!
For the individual user (application) I use
UserGroupApplication user = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytabFile);
there are several variants of this method - e.g., do they take username or keytab file? Do they set the "Login" user or do they return a new UserGroupInformation object? Be careful you understand the consequences of which one you're using since some set global values.
You now have to wrap your calls to the cluster in a doAs() call:
user.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
// do all of your hadoop calls here
return null;
}
}
I don't recall if you need to do this if you're always using the "Login" user. We need to support both local and external clusters and for us it's easiest to always wrap everything like this. It means we only need to set "user" once, at the start of the action.
See the resources mentioned above if you want details on user impersonation, using SSL encryption (rpc.privacy), etc.

how to get the list of jobs in Jenkins using java?

I have downloaded and configured Jenkins in a server, my problem is that i need to access Jenkins through Java to perform some process such as starting a job, returning the current working job and returning list of jobs in the server(all that using Json) i've tried several codes such as this but im getting no results, also i cant find a clear way to achieve that, is there is any clear API and example to do it?

You can use the Jenkins API over XML:
import org.dom4j.io.*;
import org.dom4j.*;
import java.net.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
URL url = new URL("http://your-hudson-server.local/hudson/api/xml");
Document dom = new SAXReader().read(url);
for( Element job : (List<Element>)dom.getRootElement().elements("job")) {
System.out.println(String.format("Job %s has status %s",
job.elementText("name"), job.elementText("color")));
}
}
}
A complete example (with sources) can be found here.
If these examples don't work, you might have problems with Jenkins Security (your client must provide login data before it can send the request)or with CSRF protection (you have to retrieve a token before the first request and add this token as a parameter to each request).

How can I poll FTP location to trigger the changes in it?

I am trying to poll the ftp location.
I'm using Jenkins for continuous integration of the projects.So, it would be helpful if anyone can suggest me with a plugin in Jenkins or any other method to watch over the changes in FTP location.
I need to monitor the changes in FTP location and as the changes are found I have to build another job.

Not sure how to do it with Jenkins, but if you want to monitor an FTP location for changes (i.e. receive notifications when files added/removed/modified in a directory) using plain java, then the following library can help you with the actual polling/notification mechanism: https://github.com/drapostolos/rdp4j (Remote Directory Poller for Java).
Simple usage example of the API:
package example
import java.util.concurrent.TimeUnit;
import com.github.drapostolos.rdp4j.DirectoryPoller;
import com.github.drapostolos.rdp4j.spi.PolledDirectory;
public class FtpExample {
public static void main(String[] args) throws Exception {
String host = "ftp.mozilla.org";
String workingDirectory = "pub/addons";
String username = "anonymous";
String password = "anonymous";
PolledDirectory polledDirectory = new FtpDirectory(host, workingDirectory, username, password);
DirectoryPoller dp = DirectoryPoller.newBuilder()
.addPolledDirectory(polledDirectory)
.addListener(new MyListener())
.setPollingInterval(10, TimeUnit.MINUTES)
.start();
TimeUnit.HOURS.sleep(2);
dp.stop();
}
}
The RDP4J User Guide:
provides an example of FtpDirectory class which lists files in an FTP location using appache commons FTPClient
describes what events MyListener can listen for
How to configure the DirectoryPoller

Not sure how you can achieve this in Jenkins. If I were to just answer monitoring the FTP location part here is how you can do this.
Determine what programming language you want to use. (Java, .NET etc). Write code to
monitor the FTP server (assuming it is a specific remote directory you want to monitor)
and execute the job that needs to be executed. Both the monitoring and the executing the
job needs to be done in the programming language.
I am also assuming that you need a timer of some sort to do the monitoring, this can
also be done using a programming language such as Java.

mailto URI truncated between Java.Desktop and Windows/MS outlook

I'm trying to create an automated error reporting tool for our Java desktop app. the idea is to make it as easy as possible for customers to send us error reports whenever our application crashes.
Using the Desktop.mail API, I am able to craft messages that can be easily edited and sent from our users, but I'm running into system limitations on several platforms (notably Windows 7 and MS Outlook, which most customers are using)
When I run the example code below, you'll notice that the email message that is displayed truncates the included stack trace. I believe this has something to do with a maximum length of either command lines or URIs in the underlying systems.
Is there a better way to craft an email from an error report that is not subject to this limitation?
import java.awt.Desktop;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.net.URI;
import java.net.URLEncoder;
public class Scratchpad {
public static void main(String[] args) throws Exception {
try {
generateLongStackTrace();
} catch (Error e) {
URI uri = createMailURI(e);
// this will correctly pop up the system email client, but it will truncate the message
// after about 2K of data (this seems system dependent)
Desktop.getDesktop().mail(uri);
}
}
// Will eventually generate a really long stack overflow error
public static void generateLongStackTrace() throws Exception {
generateLongStackTrace();
}
public static URI createMailURI(Error e) throws Exception {
StringBuilder builder = new StringBuilder();
builder.append("mailto:foo#example.com?body=");
// encodes the stack trace in a mailto URI friendly form
String encodedStackTrace = URLEncoder.encode(dumpToString(e), "utf-8").replace("+", "%20");
builder.append(encodedStackTrace);
return new URI(builder.toString());
}
// Dumps the offending stack trace into a string object.
public static String dumpToString(Error e) {
StringWriter sWriter = new StringWriter();
PrintWriter writer = new PrintWriter(sWriter);
e.printStackTrace(writer);
writer.flush();
return sWriter.toString();
}
}

there are length limitations wrt admissible urls in ie and the length of a windows command line (see here, here, here and here) - i seems you run into one of these (though i admit that i have not rigorously checked).
however i think it's a plausible assumption that even if you could worm your way around the said limits the length of a generic transmission buffer between desktop applications (unless you use a dedicated api for remote controlling the target app) will be restricted somehow without a loophole.
therefore i'd suggest one of the following strategies:
distribution through a web server.
upload the data to be mailed to a web server instead using the html form file upload technique.
basically you have to forge a POST request a payload with content type set to 'multipart/form-data'. your content will need some wrapper data to conform syntactically with this mime type.
the actual transmission can be instigated by means of the WinHttpRequest COM object under windows or the curl command line program from everywhere else.
server side processing can be delegated to a suitable cgi handler which eg. might produce a (short) link to download the data fom the web server.
this link may be part of the http response to the upload request or you generate it client-side in the proper format for publishing it on the web server unaltered.
pro:
this scheme is feasible - i have repeatedly applied it in enterprise projects. data transmission can be secured through https.
con:
requires a web server to implement
send a mail using an attachment (for some details see here):
save the body of your message to some file on the desktop.
generate a mailto-link that references an attachment (instead of the bulk of your body)
any decent mail client will be able to show the attachment inline if it has some elementary mime type like 'text/plain'.
on windows platforms you set it by choosing the proper file extension ('.txt')
pro:
simple
con:
file system access on the client platform;
untested (at least by me)
good luck !

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.