SearchRequest in RootDSE - java

I have to following function to query users from an AD server:
public List<LDAPUserDTO> getUsersWithPaging(String filter)
{
List<LDAPUserDTO> userList = new ArrayList<>();
try(LDAPConnection connection = new LDAPConnection(config.getHost(),config.getPort(),config.getUsername(),config.getPassword()))
{
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB,filter, null);
ASN1OctetString resumeCookie = null;
while (true)
{
searchRequest.setControls(
new SimplePagedResultsControl(100, resumeCookie));
SearchResult searchResult = connection.search(searchRequest);
for (SearchResultEntry e : searchResult.getSearchEntries())
{
LDAPUserDTO tmp = new LDAPUserDTO();
tmp.distinguishedName = e.getAttributeValue("distinguishedName");
tmp.name = e.getAttributeValue("name");
userList.add(tmp);
}
LDAPTestUtils.assertHasControl(searchResult,
SimplePagedResultsControl.PAGED_RESULTS_OID);
SimplePagedResultsControl responseControl =
SimplePagedResultsControl.get(searchResult);
if (responseControl.moreResultsToReturn())
{
resumeCookie = responseControl.getCookie();
}
else
{
break;
}
}
return userList;
} catch (LDAPException e) {
logger.error(e.getExceptionMessage());
return null;
}
}
However, this breaks when I try to search on the RootDSE.
What I've tried so far:
baseDN = null
baseDN = "";
baseDN = RootDSE.getRootDSE(connection).getDN()
baseDN = "RootDSE"
All resulting in various exceptions or empty results:
Caused by: LDAPSDKUsageException(message='A null object was provided where a non-null object is required (non-null index 0).
2020-04-01 10:42:22,902 ERROR [de.dbz.service.LDAPService] (default task-1272) LDAPException(resultCode=32 (no such object), numEntries=0, numReferences=0, diagnosticMessage='0000208D: NameErr: DSID-03100213, problem 2001 (NO_OBJECT), data 0, best match of:
''
', ldapSDKVersion=4.0.12, revision=aaefc59e0e6d110bf3a8e8a029adb776f6d2ce28')

So, I really spend a lot of time with this. It is possible to kind of query the RootDSE, but it's not that straight forward as someone might think.
I mainly used WireShark to see what the guys at Softerra are doing with their LDAP Browser.
Turns out I wasn't that far away:
As you can see, the baseObject is empty here.
Also, there is one additional Control with the OID LDAP_SERVER_SEARCH_OPTIONS_OID and the ASN.1 String 308400000003020102.
So what does this 308400000003020102 more readable: 30 84 00 00 00 03 02 01 02 actually do?
First of all, we decode this into something, we can read - in this case, this would be the int 2.
In binary, this gives us: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
As we know from the documentation, we have the following notation:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-------|-------|
| x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | SSFPR | SSFDS |
or we just take the int values from the documentation:
1 = SSFDS -> SERVER_SEARCH_FLAG_DOMAIN_SCOPE
2 = SSFPR -> SERVER_SEARCH_FLAG_PHANTOM_ROOT
So, in my example, we have SSFPR which is defined as follows:
For AD DS, instructs the server to search all NC replicas except
application NC replicas that are subordinate to the search base, even
if the search base is not instantiated on the server. For AD LDS, the
behavior is the same except that it also includes application NC
replicas in the search. For AD DS and AD LDS, this will cause the
search to be executed over all NC replicas (except for application NCs
on AD DS DCs) held on the DC that are subordinate to the search base.
This enables search bases such as the empty string, which would cause
the server to search all of the NC replicas (except for application
NCs on AD DS DCs) that it holds.
NC stands for Naming Context and those are stored as Operational Attribute in the RootDSE with the name namingContexts.
The other value, SSFDS does the following:
Prevents continuation references from being generated when the search
results are returned. This performs the same function as the
LDAP_SERVER_DOMAIN_SCOPE_OID control.
So, someone might ask why I even do this. As it turns out, I got a customer with several sub DCs under one DC. If I tell the search to handle referrals, the execution time is pretty high and too long - therefore this wasn't really an option for me. But when I turn it off, I wasn't getting all the results when I was defining the BaseDN to be the group whose members I wanted to retrieve.
Searching via the RootDSE option in Softerra's LDAP Browser was way faster and returned the results in less then one second.
I personally don't have any clue why this is way faster - but the ActiveDirectory without any interface of tool from Microsoft is kind of black magic for me anyway. But to be frank, that's not really my area of expertise.
In the end, I ended up with the following Java code:
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB, filter, null);
[...]
Control globalSearch = new Control("1.2.840.113556.1.4.1340", true, new ASN1OctetString(Hex.decode("308400000003020102")));
searchRequest.setControls(new SimplePagedResultsControl(100, resumeCookie, true),globalSearch);
[...]
The used Hex.decode() is the following: org.bouncycastle.util.encoders.Hex.
A huge thanks to the guys at Softerra which more or less put my journey into the abyss of the AD to an end.

You can't query users from the RootDSE.
Use either a domain or if you need to query users from across domains in a forest use the global catalog (running on different ports, not the default 389 / 636 for LDAP(s).
RootDSE only contains metadata. Probably this question should be asked elsewhere for more information but first read up on the documentation from Microsoft, e.g.:
https://learn.microsoft.com/en-us/windows/win32/ad/where-to-search
https://learn.microsoft.com/en-us/windows/win32/adschema/rootdse
E.g.: namingContexts attribute can be read to find which other contexts you may want to query for actual users.
Maybe start with this nice article as introduction:
http://cbtgeeks.com/2016/06/02/what-is-rootdse/

Related

SQL query using Spark and java language

I have two dataframe on spark .
The first dataframe1 is :
+--------------+--------------+--------------+
|id_z |longitude |latitude |
+--------------+--------------+--------------+
|[12,20,30 ] |-7.0737816 | 33.82666 |
|13 |-7.5952683 | 33.5441916 |
+--------------+--------------+--------------+
The second dataframe2 is :
+--------------+--------------+---------------+
|id_z2 |longitude2 |latitude2 |
+--------------+--------------+---------------+
| 14 |-8.5952683 | 38.5441916 |
| 12 |-7.0737816 | 33.82666 |
+--------------+--------------+---------------+
I want to apply the logic of the following request.
String sql = "SELECT * FROM dataframe2 WHERE dataframe2 .id_z2 IN ("'"
+ id_z +"'") and longitude2 = "'"+longitude+"'" and latitude = "'"+latitude+"'"";
I prefer not to have a join, is it possible to do this?
I really need your help , or just a starting point that will make things easier for me.
Thnak you

Janusgraph 4.0 composite index from Java app does not work

I am building my index like that:
graph = JanusGraphFactory.open("conf/janusgraph-cql-es-server.properties")
final JanusGraphManagement mt = graph.openManagement();
PropertyKey key = indexManagement.getPropertyKey("myID");
mt.buildIndex("byID", Vertex.class).addKey(key).buildCompositeIndex();
mt.commit();
ManagementSystem.awaitGraphIndexStatus(graph,"byID").call();
...
final JanusGraphManagement updateMt = graph.openManagement();
updateMt.updateIndex(updateMt.getGraphIndex("byID"), SchemaAction.REINDEX).get();
updateMt.commit();
But when I call:
graph.traversal().V().has("myID", "100");
I get a full scan, that returns a correct result:
o.j.g.transaction.StandardJanusGraphTx : Query requires iterating over all vertices [(myID = 100)]. For better performance, use indexes
Also if I print the schema I have:
---------------------------------------------------------------------------------------------------
Vertex Index Name | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
byID | Composite | false | internalindex | myID: INSTALLED |
---------------------------------------------------------------------------------------------------
Edge Index (VCI) Name | Type | Unique | Backing | Key: Status |
---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Relation Index | Type | Direction | Sort Key | Order | Status |
---------------------------------------------------------------------------------------------------
Also looking at the backing it says internalindex, I wonder if I misconfigured something.
edit:
There were 2 problems.
The index was Installed not Ready.
For string properties you also need to do:
mgmt.buildIndex('byID', Vertex.class).addKey(ID, Mapping.TEXT.asParameter())...
Shot in the dark but looks like you not creating the PropertyKey myID before trying to use it.
Try something like:
final JanusGraphManagement mt = graph.openManagement();
PropertyKey key = indexManagement.getPropertyKey("myID");
if(key == null) {
key = mt.makePropertyKey("myID").dataType(String.class).make();
}
mt.buildIndex("byID", Vertex.class).addKey(key).buildCompositeIndex();
mt.commit();

Spark and non-denormalized tables

I know Spark works much better with denormalized tables, where all the needed data is in one line. I wondering, if it is not the case, it would have a way to retrieve data from previous, or next, rows.
Example:
Formula:
value = (value from 2 year ago) + (current year value) / (value from 2 year ahead)
Table
+-------+-----+
| YEAR|VALUE|
+-------+-----+
| 2015| 100 |
| 2016| 34 |
| 2017| 32 |
| 2018| 22 |
| 2019| 14 |
| 2020| 42 |
| 2021| 88 |
+-------+-----+
Dataset<Row> dataset ...
Dataset<Results> results = dataset.map(row -> {
int currentValue = Integer.valueOf(row.getAs("VALUE")); // 2019
// non sense code just to exemplify
int twoYearsBackValue = Integer.valueOf(row[???].getAs("VALUE")); // 2016
int twoYearsAheadValue = Integer.valueOf(row[???].getAs("VALUE")); // 2021
double resultValue = twoYearsBackValue + currentValue / twoYearsAheadValue;
return new Result(2019, resultValue);
});
Results[] results = results.collect();
Is it possible to grab these values (that belongs to other rows) without changing the table format (no denormalization, no pivots ...) and also without collecting the data, or does it go totally against Spark/BigData principles?

Elasticsearch - how to group by and count matches in an index

I have an instance of Elasticsearch running with thousands of documents. My index has 2 fields like this:
|____Type_____|__ Date_added __ |
| walking | 2018-11-27T00:00:00.000 |
| walking | 2018-11-26T00:00:00.000 |
| running | 2018-11-24T00:00:00.000 |
| running | 2018-11-25T00:00:00.000 |
| walking | 2018-11-27T04:00:00.000 |
I want to group by and count how many matches were found for the "type" field, in a certain range.
In SQL I would do something like this:
select type,
count(type)
from index
where date_added between '2018-11-20' and '2018-11-30'
group by type
I want to get something like this:
| type | count |
| running | 2 |
| walking | 3 |
I'm using the High Level Rest Client api in my project, so far my query looks like this, it's only filtering by the start and end time:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders
.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
How can I do a "group by" in the "type" field? Is it possible to do this in ElasticSearch?
That's a good start! Now you need to add a terms aggregation to your query:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
// add these two lines
TermsAggregationBuilder groupBy = AggregationBuilders.terms("byType").field("type.keyword");
sourceBuilder.aggregation(groupBy);
After using Val's reply to aggregate the fields, I wanted to print the aggregations of my query together with the value of them. Here's what I did:
Terms terms = searchResponse.getAggregations().get("byType");
Collection<Terms.Bucket> buckets = (Collection<Bucket>) terms.getBuckets();
for (Bucket bucket : buckets) {
System.out.println("Type: " + bucket.getKeyAsString() + " = Count("+bucket.getDocCount()+")");
}
This is the output after running the query in an index with 2700 documents with a field called "type" and 2 different types:
Type: walking = Count(900)
Type: running = Count(1800)

Attempting to count unique users between two categories in Spark

I have a Dataset structure in Spark with two columns, one called user the other called category. Such that the table looks some what like this:
+---------------+---------------+
| user| category|
+---------------+---------------+
| garrett| syncopy|
| garrison| musictheory|
| marta| sheetmusic|
| garrett| orchestration|
| harold| chopin|
| marta| russianmusic|
| niko| piano|
| james| sheetmusic|
| manny| violin|
| charles| gershwin|
| dawson| cello|
| bob| cello|
| george| cello|
| george| americanmusic|
| bob| personalcompos|
| george| sheetmusic|
| fred| sheetmusic|
| bob| sheetmusic|
| garrison| sheetmusic|
| george| musictheory|
+---------------+---------------+
only showing top 20 rows
Each row in the table is unique but a user and category can appear multiple times. The objective is to count the number of users that two categories share. For example cello and americanmusic share a user named george and musictheory and sheetmusic share users george and garrison. The goal is to get the number of distinct users between n categories meaning that there is at most n squared edges between categories. I understand partially how to do this operation but I am struggling a little bit converting my thoughts to Spark Java.
My thinking is that I need to do a self-join on user to get a table that would be structured like this:
+---------------+---------------+---------------+
| user| category| category|
+---------------+---------------+---------------+
| garrison| musictheory| sheetmusic|
| george| musictheory| sheetmusic|
| garrison| musictheory| musictheory|
| george| musictheory| musicthoery|
| garrison| sheetmusic| musictheory|
| george| sheetmusic| musictheory|
+---------------+---------------+---------------+
The self join operation in Spark (Java code) is not difficult:
Dataset<Row> newDataset = allUsersToCategories.join(allUsersToCategories, "users");
This is getting somewhere, however I get mappings to the same category as in rows 3 and 4 in the above example and I get backwards mappings where the categories are reversed such that essentially is double counting each user interaction like in rows 5 and 6 of the above example.
What I would believe I need to do is have some sort of conditional in my join that says something along the lines of X < Y so that equal categories and duplicates get thrown away. Finally I then need to count the number of distinct rows for n squared combinations where n is the number of categories.
Could somebody please explain how to do this in Spark and specifically Spark Java since I am a little unfamiliar with the Scala syntax?
Thanks for the help.
I'm not sure if I understand your requirements correctly, but I will try to help.
According to my understanding expected result for above data should look like below. If it's not true, please let me know I will try to make requried modifications.
+--------------+--------------+-+
|_1 |_2 |
+--------------+--------------+-+
|personalcompos|sheetmusic |1|
|cello |musictheory |1|
|americanmusic |cello |1|
|cello |sheetmusic |2|
|cello |personalcompos|1|
|russianmusic |sheetmusic |1|
|americanmusic |sheetmusic |1|
|americanmusic |musictheory |1|
|musictheory |sheetmusic |2|
|orchestration |syncopy |1|
+--------------+--------------+-+
In this case you can solve your problem with below Scala code:
allUsersToCategories
.groupByKey(_.user)
.flatMapGroups{case (user, userCategories) =>
val categories = userCategories.map(uc => uc.category).toSeq
for {
c1 <- categories
c2 <- categories
if c1 < c2
} yield (c1, c2)
}
.groupByKey(x => x)
.count()
.show()
If you need symetric result you can just change if statement in flatMapGroups transformation to if c1 != c2.
Please note that in above example I used Dataset API, which for test purpose was created with below code:
case class UserCategory(user: String, category: String)
val allUsersToCategories = session.createDataset(Seq(
UserCategory("garrett", "syncopy"),
UserCategory("garrison", "musictheory"),
UserCategory("marta", "sheetmusic"),
UserCategory("garrett", "orchestration"),
UserCategory("harold", "chopin"),
UserCategory("marta", "russianmusic"),
UserCategory("niko", "piano"),
UserCategory("james", "sheetmusic"),
UserCategory("manny", "violin"),
UserCategory("charles", "gershwin"),
UserCategory("dawson", "cello"),
UserCategory("bob", "cello"),
UserCategory("george", "cello"),
UserCategory("george", "americanmusic"),
UserCategory("bob", "personalcompos"),
UserCategory("george", "sheetmusic"),
UserCategory("fred", "sheetmusic"),
UserCategory("bob", "sheetmusic"),
UserCategory("garrison", "sheetmusic"),
UserCategory("george", "musictheory")
))
I was trying to provide example in Java, but I don't have any experience with Java+Spark and it is too time consuming for me to migrate above example from Scala to Java...
I found the answer a couple of hours ago using spark sql:
Dataset<Row> connection per shared user = spark.sql("SELECT a.user as user, "
+ "a.category as categoryOne, "
+ "b.category as categoryTwo "
+ "FROM allTable as a INNER JOIN allTable as b "
+ "ON a.user = b.user AND a.user < b.user");
This will then create a Dataset with three columns user, categoryOne, and categoryTwo. Each row will be unique and will indicate when the user exists in both categories.

Categories