WebHDFS Java client not handling Kerberos Tokens correctly - java

I'm trying to run a long-lived WebHDFS client (actually building the Framework in front on HDFS). But my tokens are expiring after one day (default kerberos configuration here), at first I tried running a thread which would call
userLoginInformation.currentUser().checkTGTAndReloginFromKeytab();
However even though I see the TGT relogin 21hours, but after 24h my WebHDFS Filesystem is stuck on "token not found in the cache" (which is an error meaning that the server already deleted my token).
Watching inside the code # https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
I found the method "replaceExpiredDelegationToken". But after looking at "runWithRetry" it will be called only if "OPGETDELEGATIONTOKEN" fails (because at all other operations getRequireAuth is FALSE), which basically forces my client to run getDelegationToken at least once each day, so my token gets renewed.
**For now I'll be checking if the FS is a WebHDFS Service and then, each hour I'll do:
if (hdfsFileSystem instanceof WebHdfsFileSystem)
{
WebHdfsFileSystem tmpFS = (WebHdfsFileSystem) hdfsFileSystem;
tmpFS.setDelegationToken(tmpFS.getDelegationToken(null));
}
Is there a better way to force delegation token renewal? (or to have long-lived clients)
Thanks!

After two days testing (so kerberos ticket would run off)
Calling
if (hdfsFileSystem instanceof WebHdfsFileSystem)
{
WebHdfsFileSystem tmpFS = (WebHdfsFileSystem) hdfsFileSystem;
tmpFS.setDelegationToken(tmpFS.getDelegationToken(null));
}
once each hour, it seems to work fine, IMO this should be done at HDFS level but well... it will be # framework level for us :)

Related

Issue with SAML response notBeforeDate

Getting SAMLException" Current date is before the notBeforeDate" during authentication. The current date and "notBeforeDate" are same for 90% login attempts and it results into the error. What could be the reason for this error?
In short: This will be most likely caused by timedrift on IdP/SP servers.
If you have access to these servers, make sure you are properly synchronized with NTP servers/or adjust manually time to proper one.
If you don't, inform IT department, working for IdP side or SP side. Let them know to check server time synchronization.
Error is referring to this part of SAML request:
<saml:Subject>
<saml:NameID SPNameQualifier="http://sp.example.com/demo1/metadata.php" Format="urn:oasis:names:tc:SAML:2.0:nameid-format:transient">_ce3d2948b4cf20146dee0a0b3dd6f69b6cf86f62d7</saml:NameID>
<saml:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer">
<saml:SubjectConfirmationData NotOnOrAfter="2014-07-18T06:21:48Z" Recipient="http://sp.example.com/demo1/index.php?acs" InResponseTo="ONELOGIN_4fee3b046395c4e751011e97f8900b5273d56685"/>
</saml:SubjectConfirmation>

Java GSS-API Service Ticket not saved in Credentials Cache using Java

I have created 2 demo Kerberos Clients using the GSS-API.
One in Python3, the second in Java.
Both clients seem to be broadly equivalent, and both "work" in that I get a service ticket that is accepted by my Java GSS-API Service Principal.
However on testing I noticed that the Python client saves the service ticket in the kerberos credentials cache, whereas the Java client does not seem to save the ticket.
I use "klist" to view the contents of the credential cache.
My clients are running on a Lubuntu 17.04 Virtual Machine, using FreeIPA as the Kerberos environment. I am using OpenJDK 8 u131.
Question 1: Does the Java GSS-API not save service tickets to the credentials cache? Or can I change my code so it does so?
Question 2: Is there any downside to the fact that the service ticket is not saved to the cache?
My assumption is that cached service tickets reduce interaction with the KDC, but comments on How to save Kerberos Service Ticket using a Windows Java client? suggest that is not the case, but this Microsoft technote says "The client does not need to go back to the KDC each time it wants access to this particular server".
Question 3: The cached service tickets from the python client vanish after some minutes - long before the expiry date. What causes them to vanish?
Python code
#!/usr/bin/python3.5
import gssapi
from io import BytesIO
server_name = 'HTTP/app-srv.acme.com#ACME.COM'
service_name = gssapi.Name(server_name)
client_ctx = gssapi.SecurityContext(name=service_name, usage='initiate')
initial_client_token = client_ctx.step()
Java Code
System.setProperty("java.security.krb5.conf","/etc/krb5.conf");
System.setProperty("javax.security.auth.useSubjectCredsOnly","false");
GSSManager manager = GSSManager.getInstance();
GSSName clientName;
GSSContext context = null;
//try catch removed for brevity
GSSName serverName =
manager.createName("HTTP/app-srv.acme.com#ACME.COM", null);
Oid krb5Oid = new Oid("1.2.840.113554.1.2.2");
//use default credentials
context = manager.createContext(serverName,
krb5Oid,
null,
GSSContext.DEFAULT_LIFETIME);
context.requestMutualAuth(false);
context.requestConf(false);
context.requestInteg(true);
byte[] token = new byte[0];
token = context.initSecContext(token, 0, token.length);
Edit:
While the original question focusses on the use of the Java GSS-API to build a Java Kerberos Client, GSS is not a must. I am open to other Kerberos approaches that work on Java. Right now I am experimenting with Apache Kerby kerb-client.
So far Java GSS-API seems to have 2 problems:
1) It uses the credentials cache to get the TGT (Ok), but not to cache service-tickets (Not Ok).
2) It cannot access credential caches of type KEYRING. (Confirmed by behaviour, debugging the Java runtime security classes, and by comments in that code. For the Lubuntu / FreeIPA combination I am using KEYRING was the out-of-the-box default. This won't apply to Windows, and may not apply to other Linux Kerberos combinations.
Edit 2:
The question I should have asked is:
How do I stop my KDC from being hammered for repeated SGT requests because Java GSS is not using the credentials cache.
I leave my original answer in place at the bottom, because if largely focusses on the original question.
After another round of deep debugging and testing, I have found an acceptable solution to the root problem.
Using Java GSS API with JAAS, as opposed to "pure" GSS without JAAS in my original solution makes a big difference!
Yes, existing Service Tickets (SGTs) that may be in the credentials cache are not being loaded,
nor are any newly acquired SGTs written back to the cache, however the KDC is not be constantly hammered (the real problem).
Both pure GSS, and GSS with JAAS use a client principal subject. The subject has an in-memory privateCredentials set,
which is used to store TGTs and SGTs.
The key difference is:
"pure GSS": the subject + privateCredentials is created within the GSSContext, and lives only as long as the GSSContext lives.
GSS with JAAS: the subject is created by JAAS, outside the GSSContext, and thus can live for the life of the application,
spanning many GSSContexts during the life of the application.
The first GSSContext established will query the subject's privateCredentials for a SGT, not find one,
then request a SGT from the KDC.
The SGT is added to the subject's privateCredentials, and as the subject lives longer than the GSSContext,
it is available, as is the SGT, when following GSSContexts are created. These will find the SGT in the subject's privateCredentials, and do not need to hit the KDC for a new SGT.
So seen in the light of my particular Java Fat Client, opened once and likely to run for hours, everything is ok.
The first GSSContext created will hit the KDC for a SGT which will then be used by all following GSSContexts created until the client is closed.
The credentials cache is not being used, but that does not hurt.
In the light of a much shorter lived client, reopened many many times, and perhaps in parallel,
then use / non-use of the credentials cache might be a more serious issue.
private void initJAASandGSS() {
LoginContext loginContext = null;
TextCallbackHandler cbHandler = new TextCallbackHandler();
try {
loginContext = new LoginContext("wSOXClientGSSJAASLogin", cbHandler);
loginContext.login();
mySubject = loginContext.getSubject();
} catch (LoginException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
gssManager = GSSManager.getInstance();
try {
//TODO: LAMB: This name should be got from config / built from config / serviceIdentifier
serverName = gssManager.createName("HTTP/app-srv.acme.com#ACME.COM", null);
Oid krb5Oid = new Oid("1.2.840.113554.1.2.2");
} catch (GSSException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private String getGSSwJAASServiceToken() {
byte[] token = null;
String encodedToken = null;
token = Subject.doAs(mySubject, new PrivilegedAction<byte[]>(){
public byte[] run(){
try{
System.setProperty("javax.security.auth.useSubjectCredsOnly","true");
GSSContext context = gssManager.createContext(serverName,
krb5Oid,
null,
GSSContext.DEFAULT_LIFETIME);
context.requestMutualAuth(false);
context.requestConf(false);
context.requestInteg(true);
byte[] ret = new byte[0];
ret = context.initSecContext(ret, 0, ret.length);
context.dispose();
return ret;
} catch(Exception e){
Log.log(Log.ERROR, e);
throw new otms.util.OTMSRuntimeException("Start Client (Kerberos) failed, cause: " + e.getMessage());
}
}
});
encodedToken = Base64.getEncoder().encodeToString(token);
return encodedToken;
}
End Edit 2: Original answer below:
Question 1: Does the Java GSS-API not save service tickets to the credentials cache? Or can I change my code so it does so?
Edit: Root Cause Analysis.
After many hours debugging the sun.security.* classes, I now understand what GSS and Java Security code is doing / not doing - at least in Java 8 u 131.
In this example we have a credential cache, of a type Java GSS can access, containing a valid Ticket Granting Ticket (TGT) and a valid Service Ticket (SGT).
1) When the client principal Subject is created, the TGT is loaded from the cache (Credentials.acquireTGTFromCache()), and stored in the privateCredentials set of the Subject. --> (OK)
Only the TGT is loaded, SGTs are NOT loaded and saved to the Subject privateCredentials. -->(NOT OK)
2) Later, deep in the GSSContext.initSecContext() process, the security code actually tries to retrieve a Service Ticket from the privateCredentials of the Subject. The relevant code is Krb5Context.initSecContext() / KrbUtils.getTicket() / SubjectComber.find()/findAux(). However as SGTs were never loaded in step 1) an SGT will not be found! Therefore a new SGT is requested from the KDC and used.
This is repeated for each Service request.
Just for fun, and strictly as a proof-of-concept hack, I added a few lines of code between the login, and the initSecContext() to parse the credentials cache, extract the credentials, convert to Krb Credentials, and add them to the Subject’s private credentials.
This done, in step 2) the existing SGT is found and used. No new SGT is requested from the KDC.
I will not post the code for this hack as it calls sun internal classes that we should not be calling, and I don’t wish to inspire anybody else to do so. Nor do I intend to use this hack as a solution.
—> The root cause problem is not that the service ticket are not SAVED to the cache; but rather
a) that SGTs are not LOADED from the credential cache to the Subject of the client principal
and
b) that there is no public API or configuration settings to do so.
This affects GSS-API both with and without JAAS.
So where does this leave me?
i) Use Java GSS-API / GSS-API with JAAS “as is”, with each SGT Request hitting the KDC —> Not good.
ii) As suggested by Samson in the comments below, use Java GSS-API only for initial login of the application, then for all further calls use an alternative security mechanism for subsequent calls (a kind of self-built kerberos-light) using tokens or cookies.
iii) Consider alternatives to GSS-API such as Apache Kerby kerb-client. This has implications outside the scope of this answer, and may well prove to be jumping from the proverbial frying pan to the fire.
I have submitted a Java Feature Request to Oracle, suggesting that SGTs should be retrieved from the cache and stored in the Subject credentials (as already the case for TGTs).
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8180144
Question 2: Is there any downside to the fact that the service ticket is not saved to the cache?
Using the credentials cache for Service Tickets reduces interaction between the client and the KDC. The corollary to this is that where service tickets are not cached, each request will require interaction with the KDC, which could lead to the KDC being hammered.

How to obtain renewable kerberos tickets using java GSS+JAAS

I am using jTDS to connect to SQLServer. Internally jTDS uses GSS to obtain a kerberos' service ticket and establish a secure context. Since my app is long lived and my connections are kept alive the entire time I need that kerberos' service ticket to be renewable in order to allow SQL server to renew them on its own (the kdc policies are set to expire all tickets after 12 hours).
What jTDS does to obtain a kerberos token is (more or less) the following:
GSSManager manager = GSSManager.getInstance();
// Oids for Kerberos5
Oid mech = new Oid("1.2.840.113554.1.2.2");
Oid nameType = new Oid("1.2.840.113554.1.2.2.1");
// Canonicalize hostname to create SPN like MIT Kerberos does
GSSName serverName = manager.createName("MSSQLSvc/" + host + ":" + port, nameType);
GSSContext gssContext = manager.createContext(serverName, mech, null, GSSContext.DEFAULT_LIFETIME);
gssContext.requestMutualAuth(false);
gssContext.requestCredDeleg(true);
byte[] ticket = gssContext.initSecContext(new byte[0], 0, 0);
What I suspect is that the ticket I am obtaining is not renewable. I am checking that by doing something like the following:
ExtendedGSSContext extendedContext = (ExtendedGSSContext) gssContext;
boolean[] flags = (boolean[]) extendedContext.inquireSecContext(InquireType.KRB5_GET_TKT_FLAGS);
System.out.println("Renewable = " + flags[8]);
In our particular configuration GSS is getting kerberos TGT from JAAS login module. We have the following variable set to false -Djavax.security.auth.useSubjectCredsOnly=false and in the login.cfg file we have the following login module configured:
com.sun.security.jgss.krb5.initiate {
com.sun.security.auth.module.Krb5LoginModule required
useKeytTab=true
keyTab="/home/batman/.batman.ktab"
principal="batman#GOTHAMCITY.INT"
storeKey=true
doNotPrompt=true
debug=false
};
Another thing I notice is that the getLifetime() method of GSSContext doesn't seem to work. It always returns 2147483647 (max int) no matter what the real lifetime of the ticket is.
I feel confortable with branching jTDS driver, so I can modify the way it establishes a GSS context if needed.
What I tried:
Use native implementation of GSS api:
This works fine for me in terms of obtaining renewable tickets but it imposesses another set of issues (in terms of ensuring that the ticket cache is properly set and tickets in there are properly renew). If I can bypass this option it would be nice. Once thing I observe here is that the getLifetime() method actually returns the real lifetime in seconds of the ticket.
Reimplementing KerberosLoginModule:
Based on the answer to this question Jaas - Requesting Renewable Kerberos Tickets I reimplemented LoginModule to set the RENEW KDCOption in KrbAsReqBuilder before requesting a TGT. That works fine in the sense that I obtain a renewable TGT, but the ticket obtained from that TGT by GSS is still not renewable. If I set a breakpoint in the constructor of the KDCOption object and set the RENEW flag manually on each request (even the KrbTgsReq done by GSS) it works but making that change productive requires a major rewrite on GSS which I don't feel confortable with.
For administrators, the fact that Kerberos tickets have lifetime is an important security feature. User knows a password, so he/she may get a new ticket at any moment. But for intruder it's a problem - after the ticket expires, it can't be used to break into system. Administrators want this lifetime to be as short as possible, but not too short (like 1 hour) because users would generate like 10x more login requests than now, and it would be tough for ActiveDirectory to handle.
When we need to authenticate with Kerberos, we should use connection pooling (and DataSource). To use this feature in jTDS you need to add ConnectionPoolImplementation (recommended: DBCP or c3p0, see: http://jtds.sourceforge.net/features.html).
If you'd like to write your application using older way of connecting to database (without datasource, i.e. creating a connection and keeping it alive because it's expensive to create..) then the next obstacle would be 'renew lifetime'. In ActiveDirectory Kerberos tickets can be by default renewed within 7 days. There's a global setting in AD which allows to set there 0 (an indefinite renew lifetime), but you'd need to persuade Domain Administrator to lower security of whole domain just because one service wouldn't run without that.

How to continue on client when heavy server computation is done

This might be a simple problem, but I can't seem to find a good solution right now.
I've got:
OldApp - a Java application started from the command line (no web front here)
NewApp - a Java application with a REST api behind Apache
I want OldApp to call NewApp through its REST api and when NewApp is done, OldApp should continue.
My problem is that NewApp is doing a lot of stuff that might take a lot of time which in some cases causes a timeout in Apache, and then sends a 502 error to OldApp. The computations continue in NewApp, but OldApp does not know when NewApp is done.
One solution I thought of is fork a thread in NewApp and store some kind of ID for the API request, and return it to OldApp. Then OldApp could poll NewApp to see if the thread is done, and if so - continue. Otherwise - keep polling.
Are there any good design patterns for something like this? Am I complicating things? Any tips on how to think?
If NewApp is taking a long time, it should immediately return a 202 Accepted. The response should contain a Location header indicating where the user can go to look up the result when it's done, and an estimate of when the request will be done.
OldApp should wait until the estimate time is reached, then submit a new GET call to the location. The response from that GET will either be the expected data, or an entity with a new estimated time. OldApp can then try again at the later time, repeating until the expected data is available.
So The conversation might look like:
POST /widgets
response:
202 Accepted
Location: "http://server/v1/widgets/12345"
{
"estimatedAvailableAt": "<whenever>"
}
.
GET /widgets/12345
response:
200 OK
Location: "http://server/v1/widgets/12345"
{
"estimatedAvailableAt": "<wheneverElse>"
}
.
GET /widgets/12345
response:
200 OK
Location: "http://server/v1/widgets/12345"
{
"myProperty": "myValue",
...
}
Yes, that's exactly what people are doing with REST now. Because there no way to connect from server to client, client just polls very often. There also some improved method called "long polling", when connection between client and server has big timeout, and server send information back to connected client when it becomes available.
The question is on java and servlets ... So I would suggest looking at Servlet 3.0 asynchronous support.
Talking from a design perspective, you would need to return a 202 accepted with an Id and an URL to the job. The oldApp needs to check for the result of the operation using the URL.
The thread that you fork on the server needs to implement the Callable interface. I would also recommend using a thread pool for this. The GET url for the Job that was forked can check the Future object status and return it to the user.

AWS Error Message: InvalidInstanceID.NotFound

I'm trying to start a Amazon EC2 cloud machine with [startInstance][2] method using aws-sdk in Java. My code is as follows.
public String startInstance(String instanceId) throws Exception {
List<String> instanceIds = new ArrayList<String>();
instanceIds.add(instanceId);
StartInstancesRequest startRequest = new StartInstancesRequest(
instanceIds);
startRequest.setRequestCredentials(getCredentials());
StartInstancesResult startResult = ec2.startInstances(startRequest);
List<InstanceStateChange> stateChangeList = startResult
.getStartingInstances();
log.trace("Starting instance '{}':", instanceId);
// Wait for the instance to be started
return waitForTransitionCompletion(stateChangeList, "running",
instanceId);
}
When I run the above code, i'm getting the following AWS error:
Status Code: 400, AWS Request ID: e1bd4795-a609-44d1-9e80-43611e80006b, AWS Erro
r Code: InvalidInstanceID.NotFound, AWS Error Message: The instance ID 'i-2b97ac
2f' does not exist
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpCli
ent.java:538)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.ja
va:283)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:168
)
at com.amazonaws.services.ec2.AmazonEC2Client.invoke(AmazonEC2Client.jav
a:5208)
at com.amazonaws.services.ec2.AmazonEC2Client.startInstances(AmazonEC2Cl
ient.java:2426)
AWS Error Message: The instance ID 'i-2b97ac2f' does not exist
You'll have to take the AWS response for granted here, i.e. the instance does not exist ;)
But seriously: Presumably you have already verified that you are actually running an instance with this ID in your account? Then this is most likely caused by targeting the wrong API endpoint, insofar an instance ID is only valid within a specific region (if not specified, the region defaults to 'us-east-1', see below).
In this case you need to specify the actual instance region via the setEndpoint() method of the AmazonEC2Client object within the apparently global ec2 variable before calling startInstances().
There are some examples regarding Using Regions with the AWS SDKs and all currently available AWS regional endpoint URLs are listed in Regions and Endpoints, specifically the Amazon Elastic Compute Cloud (EC2) defaults to 'us-east-1':
If you just specify the general endpoint (ec2.amazonaws.com), Amazon
EC2 directs your request to the us-east-1 endpoint.
We run a service (Qubole) that frequently spawns and then tags (and in some cases terminates) AWS instances immediately.
We have found that Amazon will, every once in a while, claim an instanceid as invalid - even though it has just created it. Retrying a few times with some sleep time thrown in usually solves the problem. Even a total retry interval of 15s proved insufficient in rare cases.
This experience comes from the useast region. We do not make api calls to different regions - so that is not an explanation. More likely - this is the infamous eventual consistency at work - where AWS is unable to provide read-after-write consistency for these api calls.
I am using the AWS ruby api and I noticed the same issue when creating an AMI image and its status is pending when I look in the AWS console but after a while the image is available for use.
Here is my script
image = ec2.images.create(:name => image_name, :instance_id => ami_id, :description => desc)
sleep 5 while image.state != :available
I sleep for about 5 sec for image to be in available but I get the error saying that the "AWS Error Message: InvalidInstanceID.NotFound". During my testing this is fine but most of the time this seems to be failing during continuous integration builds.
InvalidInstanceID.NotFound means the specified instance does not exist.
Ensure that you have indicated the region in which the instance is located, if it's not in the default region.
This error may occur because the ID of a recently created instance has not propagated through the system. For more information, see Eventual Consistency.

Categories