How can C# override LDAP server limit, but not Java - java

I'm rewriting my C# program to Java and became very curios about the fact that C# application can extract tens of thousands of users with this trick:
DirectorySearcher search = new DirectorySearcher(entry);
search.SizeLimit = 99000;
search.PageSize = 98000;
but my Java programs firmly say
LDAPSearchException(resultCode=4 (size limit exceeded), numEntries=1000, numReferences=0, errorMessage='size limit exceeded')
I tried both unboundid and standard libraries. Found million discussions about this problem and everywhere is said - the limitation is on server, you can do nothing.
But my C# application does work! How can this happen? Secret techniques from Microsoft, that cannot be repeated by other vendors?
Just in case, my code is:
SearchRequest searchRequest = new SearchRequest(path, SearchScope.SUB, filter, "SamAccountName");
searchRequest.setSizeLimit(99000);
searchRequest.setTimeLimitSeconds(999);
SearchResult result = connection.search(searchRequest);
for (SearchResultEntry sre : result.getSearchEntries()) {
System.out.println(count++ + ": " + sre.toString());
}
for unboundid
p.s. I do not want to use workaround with searching for a*, b*, c*
et c. Especially, considering that usernames might be not only in English.

Further reading showed, that unboundid does support paged mode, so problem is solved.
public static void main(String[] args) {
try {
int count = 0;
LDAPConnection connection = new LDAPConnection("hostname", 389, "user#domain", "password");
final String path = "OU=Users,DC=org,DC=com";
String[] attributes = {"SamAccountName","name"};
SearchRequest searchRequest = new SearchRequest(path, SearchScope.SUB, Filter.createEqualityFilter("objectClass", "person"), attributes);
ASN1OctetString resumeCookie = null;
while (true)
{
searchRequest.setControls(
new SimplePagedResultsControl(100, resumeCookie));
SearchResult searchResult = connection.search(searchRequest);
for (SearchResultEntry e : searchResult.getSearchEntries())
{
if (e.hasAttribute("SamAccountName"))
System.out.print(count++ + ": " + e.getAttributeValue("SamAccountName"));
if (e.hasAttribute("name"))
System.out.println("->" + e.getAttributeValue("name"));
}
LDAPTestUtils.assertHasControl(searchResult,
SimplePagedResultsControl.PAGED_RESULTS_OID);
SimplePagedResultsControl responseControl =
SimplePagedResultsControl.get(searchResult);
if (responseControl.moreResultsToReturn())
{
resumeCookie = responseControl.getCookie();
}
else
{
break;
}
}
}
catch (Exception e)
{
System.out.println(e.toString());
}
}

Related

Why is my Spark driver program so slow?

My problem: I have a model engine that takes a list of parameter configuration, and evaluates a double value that corresponds to the metric associated to that configuration. I have six parameters, and each of them can vary according to a list. I want to find by brute force the best parameter configuration considering the combination that will produce the higher value for the output metric. Since I'm learning Spark, I realized that with the cartesian product operation I can easily generate the combinations, and split the RDD to be processed in parallel. So, I came up with this driver program:
public static void main(String[] args) {
String scriptName = "model.mry";
String scriptStr = null;
try {
scriptStr = new String(Files.readAllBytes(Paths.get(scriptName)));
} catch (IOException ex) {
Logger.getLogger(BruteForceDriver.class.getName()).log(Level.SEVERE, null, ex);
System.exit(1);
}
final String script = scriptStr;
SparkConf conf = new SparkConf()
.setAppName("wordCount")
.setSparkHome("/home/danilo/bin/spark-2.2.0-bin-hadoop2.7")
.setJars(new String[]{"/home/danilo/NetBeansProjects/SparkHello1/target/SparkHello1-1.0.jar",
"/home/danilo/.m2/repository/org/modcs/mercury/4.7/mercury-4.7.jar"})
.setMaster("spark://danilo-desktop:7077");
String baseDir = "/home/danilo/NetBeansProjects/SimulationOptimization/workspace/";
JavaSparkContext sc = new JavaSparkContext(conf);
final int NUM_SERVICES = 6;
final int QTD = 3;
JavaRDD<Service>[] providers = new JavaRDD[NUM_SERVICES];
for (int i = 1; i <= NUM_SERVICES; i++) {
providers[i - 1] = sc.textFile(baseDir + "provider"
+ i
+ ".mat")
.filter((t1) -> !t1.contains("#") && !t1.trim().isEmpty())
.map(Service.createParser("" + i))
.zipWithIndex().filter((t1) -> {
return t1._2 < QTD;
}).keys();
}
JavaPairRDD c = null;
JavaRDD<Service> p = providers[0];
for (int i = 1; i < NUM_SERVICES; i++) {
if (c == null) {
c = p.cartesian(providers[i]);
} else {
c = c.cartesian(providers[i]);
}
}
JavaRDD<List<Service>> cartesian = c.map(new FlattenTuple<>());
final Broadcast<ModelEvaluator> model = sc.broadcast(new ModelEvaluator(script));
JavaPairRDD<Double, List<Service>> results = cartesian.mapToPair(
(t) -> {
try {
double val = model.value().evaluateModel(t);
System.out.println(val);
return new Tuple2<>(val, t);
} catch (Exception ex) {
return null;
}
}
);
results.sortByKey().collect().forEach((t) -> {
System.out.println(t._1 + ", " + t._2);
});
sc.close();
}
The "QTD" variable allows me to control the size the interval which each parameter will vary. For QTD = 3, I'll have 3^6 = 729 combinations. The problem is that it is taking so long to compute all those combinations. I wrote a implementations using only normal Java threads, and the runtime is about 40 seconds. Using my Spark driver program, the runtime more than 6 minutes. Why is my Spark program so slow compared to the plain Java multi-thread program?
Edit:
I put:
results = results.cache();
before sorting the results and now the runtime is 2.5 minutes.
Edit 2:
I created a RDD with the cartesian product of the parameters by hand instead of using the operation provided by the framework. Now my runtime is 1'25''. It does make sense now, since there is some overhead to start the driver and move the jars to the workers.

How to get all members of AD group via LDAP in Java

I have written an application that retrieves Active Directory groups and flattens them, i.e. includes recursively members of subgroup to the top parent group.
It works fine for small groups, but with larger groups I am facing a problem.
If number of members does not exceed 1500, they are listed in the member attribute. If there are more - then this attribute is empty and attribute with name member;range:0-1499 appears, containing first 1500 members.
My problem that I don't know how to get the rest of member set over 1500.
We have groups with 8-12 thousand members. Do I need to run another query?
On the Microsoft site I have seen C# code snippet on the similar matter, but couldn't make much sense of it, as they were showing how to specify a range, but not how to plug it into query. If someone knows how to do it in Java, I'd appreciate a tip.
This will obviously give you the next ones:
String[] returnedAtts = { "member;range=1500-2999" };
You need to fetch the users chunk by chunk (1500 chunks) Just make a counter and update you search and retrieve the next ones until you have all of them.
With your help I have a full working code
// Initialize
LdapContext ldapContext = null;
NamingEnumeration<SearchResult> results = null;
NamingEnumeration<?> members = null;
try {
// Initialize properties
Properties properties = new Properties();
properties.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.ldap.LdapCtxFactory");
properties.put(Context.PROVIDER_URL, "ldap://" + ldapUrl);
properties.put(Context.SECURITY_PRINCIPAL, adminLoginADOnPremise);
properties.put(Context.SECURITY_CREDENTIALS, adminPasswordADOnPremise);
// Initialize ldap context
ldapContext = new InitialLdapContext(properties, null);
int range = 0;
boolean finish = false;
while (finish != true) {
// Set search controls
SearchControls searchCtls = new SearchControls();
searchCtls.setSearchScope(SearchControls.SUBTREE_SCOPE);
searchCtls.setReturningAttributes(generateRangeArray(range));
// Get results
results = ldapContext.search(ldapBaseDn, String.format("(samAccountName=%s)", groupName), searchCtls);
if (results.hasMoreElements() == true) {
SearchResult result = results.next();
try {
members = result.getAttributes().get(generateRangeString(range)).getAll();
while (members.hasMore()) {
String distinguishedName = (String) members.next();
logger.debug(distinguishedName);
}
range++;
} catch (Exception e) {
// Fails means there is no more result
finish = true;
}
}
}
} catch (NamingException e) {
logger.error(e.getMessage());
throw new Exception(e.getMessage());
} finally {
if (ldapContext != null) {
ldapContext.close();
}
if (results != null) {
results.close();
}
}
Two functions missing from the working code example by #Nicolas, I guess they would be something like:
public static String[] generateRangeArray(int i) {
String range = "member;range=" + i * 1500 + "-" + ((i + 1) * 1500 - 1);
String[] returnedAtts = { range };
return returnedAtts;
}
public static String generateRangeString(int i) {
String range = "member;range=" + i * 1500 + "-" + ((i + 1) * 1500 - 1);
return range;
}
The code does not handle the case if the AD group is not so large that the member attribute actually needs to be "chunked", that is if the "member" attribute exists instead.

DNS Lookups in Java using JNDI and Default Domain

I am using JNDI in Java to perform DNS lookups in my application to resolve A records - running under Java 8 on Windows 7. However, I am having trouble resolving records unless I specify the complete host entry including domain name.
Java appears to be ignoring the DNS search list which is configured on the PC. I don't have a problem including the domain name, if that is what Java requires, but I can't find a public method to obtain the domains in the search list.
The following SSCCE uses the private method sun.net.dns.ResolverConfiguration to obtain the DNS search list, but I shouldn't use it as it is an internal proprietary API and may be removed in a future release.
import java.util.*;
import javax.naming.*;
import javax.naming.directory.*;
public class SSCCE {
public static void main(String[] args) {
String[] hostsToLookup = new String[] { "testhost", "testhost.mydomain.com" };
try {
System.out.println("DNS Search List:");
for (Object o: sun.net.dns.ResolverConfiguration.open().searchlist()) {
System.out.println(" " + o);
}
Properties p = new Properties();
p.put(Context.INITIAL_CONTEXT_FACTORY, "com.sun.jndi.dns.DnsContextFactory");
InitialDirContext idc = new InitialDirContext(p);
for (String h : hostsToLookup) {
System.out.println("Host: " + h);
try {
Attributes attrs = idc.getAttributes(h, new String[] { "A" });
Attribute attr = attrs.get("A");
if (attr != null) {
for (int i = 0; i < attr.size(); i++) {
System.out.println(" " + attr.get(i));
}
}
}
catch (NameNotFoundException e) {
System.out.println(" undefined");
}
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}
When I run this using just the host part it doesn't resolve, but when I manually add the domain from the search list then it does:
DNS Search List:
mydomain.com
Host: testhost
undefined
Host: testhost.mydomain.com
192.0.2.1
Is it possible to either make Java honour the DNS search list using JNDI or is there a public method to obtain the DNS search list?

Java ExecutorService Runnable doesn't update value

I'm using Java to download HTML contents of websites whose URLs are stored in a database. I'd like to put their HTML into database, too.
I'm using Jsoup for this purpose:
public String downloadHTML(String byLink) {
String htmlInPage = "";
try {
Document doc = Jsoup.connect(byLink).get();
htmlInPage = doc.html();
} catch (org.jsoup.UnsupportedMimeTypeException e) {
// process this and some other exceptions
}
return htmlInPage;
}
I'd like to download websites concurrently and use this function:
public void downloadURL(int websiteId, String url,
String categoryName, ExecutorService executorService) {
executorService.submit((Runnable) () -> {
String htmlInPage = downloadHTML(url);
System.out.println("Category: " + categoryName + " " + websiteId + " " + url);
String insertQuery =
"INSERT INTO html_data (website_id, html_contents) VALUES (?,?)";
dbUtils.query(insertQuery, websiteId, htmlInPage);
});
}
dbUtils is my class based on Apache Commons DbUtils. Details are here: http://pastebin.com/iAKXchbQ
And I'm using everything mentioned above in a such way: (List<Object[]> details are explained on pastebin, too)
public static void main(String[] args) {
DbUtils dbUtils = new DbUtils("host", "db", "driver", "user", "pass");
List<String> categoriesList =
Arrays.asList("weapons", "planes", "cooking", "manga");
String sql = "SELECT lw.id, lw.website_url, category_name " +
"FROM list_of_websites AS lw JOIN list_of_categories AS lc " +
"ON lw.category_id = lc.id " +
"where category_name = ? ";
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (String category : categoriesList) {
List<Object[]> sitesInCategory = dbUtils.select(sql, category );
for (Object[] entry : sitesInCategory) {
int websiteId = (int) entry[0];
String url = (String) entry[1];
String categoryName = (String) entry[2];
downloadURL(websiteId, url, categoryName, executorService);
}
}
executorService.shutdown();
}
I'm not sure if this solution is correct but it works. Now I want to modify code to save HTML not from all websites in my database, but only their fixed ammount in each category.
For example, download and save HTML of 50 websites from the "weapons" category, 50 from "planes", etc. I don't think it's necessary to use sql for this purpose: if we select 50 sites per category, it doesn't mean we save them all, because of possibly incorrect syntax and connection problems.
I've tryed to create separate class implementing Runnable with fields: counter and maxWebsitesPerCategory, but these variables aren't updated. Another idea was to create field Map<String,Integer> sitesInCategory instead of counter, put each category as a key there and increment its value until it reaches maxWebsitesPerCategory, but it didn't work, too. Please, help me!
P.S: I'll also be grateful for any recommendations connected with my realization of concurrent downloading (I haven't worked with concurrency in Java before and this is my first attempt)
How about this?
for (String category : categoriesList) {
dbUtils.select(sql, category).stream()
.limit(50)
.forEach(entry -> {
int websiteId = (int) entry[0];
String url = (String) entry[1];
String categoryName = (String) entry[2];
downloadURL(websiteId, url, categoryName, executorService);
});
}
sitesInCategory has been replaced with a stream of at most 50 elements, then your code is run on each entry.
EDIT
In regard to comments. I've gone ahead and restructured a bit, you can modify/implement the content of the methods I've suggested.
public void werk(Queue<Object[]> q, ExecutorService executorService) {
executorService.submit(() -> {
try {
Object[] o = q.remove();
try {
String html = downloadHTML(o); // this takes one of your object arrays and returns the text of an html page
insertIntoDB(html); // this is the code in the latter half of your downloadURL method
}catch (/*narrow exception type indicating download failure*/Exception e) {
werk(q, executorService);
}
}catch (NoSuchElementException e) {}
});
}
^^^ This method does most of the work.
for (String category : categoriesList) {
Queue<Object[]> q = new ConcurrentLinkedQueue<>(dbUtils.select(sql, category));
IntStream.range(0, 50).forEach(i -> werk(q, executorService));
}
^^^ this is the for loop in your main
Now each category tries to download 50 pages, upon failure of downloading a page it moves on and tries to download another page. In this way, you will either download 50 pages or have attempted to download all pages in the category.

Solr Performance for many documents query

I want to have Solr always retrieve all documents found by a search (I know Solr wasn't built for that, but anyways) and I am currently doing this with this code:
...
List<Article> ret = new ArrayList<Article>();
QueryResponse response = solr.query(query);
int offset = 0;
int totalResults = (int) response.getResults().getNumFound();
List<Article> ret = new ArrayList<Article>((int) totalResults);
query.setRows(FETCH_SIZE);
while(offset < totalResults) {
//requires an int? wtf?
query.setStart((int) offset);
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
response = solr.query(query);
List<Article> current = response.getBeans(Article.class);
offset += current.size();
ret.addAll(current);
}
...
This works, but is pretty slow if a query gets over 1000 hits (I've read about that on here. This is being caused by Solr because I am setting the start everytime which - for some reason - takes some time). What would be a nicer (and faster) ways to do this?
To improve the suggested answer you could use a streamed response. This has been added especially for the case that one fetches all results. As you can see in Solr's Jira that guy wants to do the same as you do. This has been implemented for Solr 4.
This is also described in Solrj's javadoc.
Solr will pack the response and create a whole XML/JSON document before it starts sending the response. Then your client is required to unpack all that and offer it as a list to you. By using streaming and parallel processing, which you can do when using such a queued approach, the performance should improve further.
Yes, you will loose the automatic bean mapping, but as performance is a factor here, I think this is acceptable.
Here is a sample unit test:
public class StreamingTest {
#Test
public void streaming() throws SolrServerException, IOException, InterruptedException {
HttpSolrServer server = new HttpSolrServer("http://your-server");
SolrQuery tmpQuery = new SolrQuery("your query");
tmpQuery.setRows(Integer.MAX_VALUE);
final BlockingQueue<SolrDocument> tmpQueue = new LinkedBlockingQueue<SolrDocument>();
server.queryAndStreamResponse(tmpQuery, new MyCallbackHander(tmpQueue));
SolrDocument tmpDoc;
do {
tmpDoc = tmpQueue.take();
} while (!(tmpDoc instanceof PoisonDoc));
}
private class PoisonDoc extends SolrDocument {
// marker to finish queuing
}
private class MyCallbackHander extends StreamingResponseCallback {
private BlockingQueue<SolrDocument> queue;
private long currentPosition;
private long numFound;
public MyCallbackHander(BlockingQueue<SolrDocument> aQueue) {
queue = aQueue;
}
#Override
public void streamDocListInfo(long aNumFound, long aStart, Float aMaxScore) {
// called before start of streaming
// probably use for some statistics
currentPosition = aStart;
numFound = aNumFound;
if (numFound == 0) {
queue.add(new PoisonDoc());
}
}
#Override
public void streamSolrDocument(SolrDocument aDoc) {
currentPosition++;
System.out.println("adding doc " + currentPosition + " of " + numFound);
queue.add(aDoc);
if (currentPosition == numFound) {
queue.add(new PoisonDoc());
}
}
}
}
You might improve performance by increasing FETCH_SIZE. Since you are getting all the results, pagination doesn't make sense unless you are concerned with memory or some such. If 1000 results are liable to cause a memory overflow, I'd say your current performance seems pretty outstanding though.
So I would try getting everything at once, simplifying this to something like:
//WHOLE_BUNCHES is a constant representing a reasonable max number of docs we want to pull here.
//Integer.MAX_VALUE would probably invite an OutOfMemoryError, but that would be true of the
//implementation in the question anyway, since they were still being stored in the list at the end.
query.setRows(WHOLE_BUNCHES);
QueryResponse response = solr.query(query);
int totalResults = (int) response.getResults().getNumFound(); //If you even still need this figure.
List<Article> ret = response.getBeans(Article.class);
If you need to keep the pagination though:
You are performing this first query:
QueryResponse response = solr.query(query);
and are populating the number of found results from it, but you are not pulling any results with the response. Even if you keep pagination here, you could at least eliminate one extra query here.
This:
int left = totalResults - offset;
if(left < FETCH_SIZE) {
query.setRows(left);
}
Is unnecessary. setRows specifies a Maximum number of rows to return, so asking for more than are available won't cause any problems.
Finally, apropos of nothing, but I have to ask: what argument would you expect setStart to take if not an int?
Use below logic to fetch solr data as batch to optimize performance of solr data fetch query:
public List<Map<String, Object>> getData(int id,Set<String> fields){
final int SOLR_QUERY_MAX_ROWS = 3;
long start = System.currentTimeMillis();
SolrQuery query = new SolrQuery();
String queryStr = "id:" + id;
LOG.info(queryStr);
query.setQuery(queryStr);
query.setRows(SOLR_QUERY_MAX_ROWS);
QueryResponse rsp = server.query(query, SolrRequest.METHOD.POST);
List<Map<String, Object>> mapList = null;
if (rsp != null) {
long total = rsp.getResults().getNumFound();
System.out.println("Total count found: " + total);
// Solr query batch
mapList = new ArrayList<Map<String, Object>>();
if (total <= SOLR_QUERY_MAX_ROWS) {
addAllData(mapList, rsp,fields);
} else {
int marker = SOLR_QUERY_MAX_ROWS;
do {
if (rsp != null) {
addAllData(mapList, rsp,fields);
}
query.setStart(marker);
rsp = server.query(query, SolrRequest.METHOD.POST);
marker = marker + SOLR_QUERY_MAX_ROWS;
} while (marker <= total);
}
}
long end = System.currentTimeMillis();
LOG.debug("SOLR Performance: getData: " + (end - start));
return mapList;
}
private void addAllData(List<Map<String, Object>> mapList, QueryResponse rsp,Set<String> fields) {
for (SolrDocument sdoc : rsp.getResults()) {
Map<String, Object> map = new HashMap<String, Object>();
for (String field : fields) {
map.put(field, sdoc.getFieldValue(field));
}
mapList.add(map);
}
}

Categories