Flux - parallel flatMap with webclient - limit to fixed batched rate

Flux - parallel flatMap with webclient - limit to fixed batched rate - java

The code I have is this:
return Flux.fromIterable(new Generator()).log()
.flatMap(
s ->
webClient
.head()
.uri(
MessageFormat.format(
"/my-{2,number,#00}.xml",
channel, timestamp, s))
.exchangeToMono(r -> Mono.just(r.statusCode()))
.filter(HttpStatus::is2xxSuccessful)
.map(r -> s),
6) //only request 6 segments in parallel via webClient
.take(6) //we need only 6 200 OK responses
.sort();
It just requests HEAD, until first 6 requests are successful.
Parallelization works here, but the problem is that after 1 of the requests is complete, it immediatley triggers next request (to maintain parallelization level of 6). What I need here is to have parallelization level of 6, but in batches. So - trigger 6 requests, wait until all complete, trigger again 6 requests ...
This is the output of the log() above:
: | request(6)
: | onNext(7)
: | onNext(17)
: | onNext(27)
: | onNext(37)
: | onNext(47)
: | onNext(57)
: | request(1) <---- from here NOT OK; wait until all complete and request(6)
: | onNext(8)
: | request(1)
: | onNext(18)
: | request(1)
: | onNext(28)
: | request(1)
: | onNext(38)
: | request(1)
: | onNext(48)
: | request(1)
: | onNext(58)
: | cancel()
UPDATE
this is what I tried with the buffer:
return Flux.fromIterable(new Generator())
.buffer(6)
.flatMap(Flux::fromIterable)
.log()
.flatMap(
s ->
webClient
.head()
.uri(
MessageFormat.format(
"/my-{2,number,#00}.xml",
channel, timestamp, s))
.exchangeToMono(r -> Mono.just(r.statusCode()))
.filter(HttpStatus::is2xxSuccessful)
.map(r -> s),
6) //only request 6 segments in parallel via webClient
.take(6)
.sort();

OK, It seems I have the code that works. Here I use window:
return Flux.fromIterable(new Generator())
.window(6) //group 1,2,3,4,5,6,7... into [0,1,2,3,4,5],[6,7..,11],[12,..,17]
.log()
.flatMap(
s -> s.log().flatMap(x -> webClient
.head()
.uri(
MessageFormat.format(
"/my-{2,number,#00}.xml",
channel, timestamp, x))
.exchangeToMono(r -> Mono.just(r.statusCode()))
.filter(HttpStatus::is2xxSuccessful)
.map(r -> x), 6), 1) //1 means take only 1 array ([0,1,2,3,4,5]). 6 means take in parallel all from array (0,1,2,3,4,5)
.take(6, true) //pass through only 6 elements (cancel afterwards)
.sort();

Related

Why reactor does not process each element concurrently?

What I expect is reactor will create threads for each emitted element, meaning elements 1, 2, 3, 4, 5 from the source should be handled concurrently. But it's not true from my demo code output, but I don't know why. Could someone take a look and explain to me for two things:
Why does reactor in my demo code handles elements in synchronize fashion？
How to make reactor handles each element concurrent?
Though the reactor chain in my below demo code is async to the main thread, each element from source flux emits in a synchronized way.
Here is my demo code
System.out.println("main thread start ...");
Flux.range(1,5)
.flatMap(num->{
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return Mono.just(num);
}).flatMap(num-> Mono.just(num*10) )
.subscribeOn(Schedulers.boundedElastic())
.subscribe(res-> System.out.println("Thread name:" + Thread.currentThread().getName()+" value:" + res));
System.out.println("main thread sleep a little");
Thread.sleep(4000);
System.out.println("main thread end ...");
Here is the output
Output:
main thread start ...
main thread sleep a little
0. element: 0
1. element: 1
main thread end ...
2. element: 2

Your code is not really implemented in a reactive way especially for wrapping non-reactive code. Consider slightly reworked example
#Test
void concurrent() {
Flux<Integer> stream = Flux.range(1, 50)
.flatMap(this::process)
.flatMap(num -> Mono.just(num * 10));
StepVerifier.create(stream)
.expectNextCount(50)
.verifyComplete();
}
private Mono<Integer> process(int num) {
return Mono.fromCallable(() -> {
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
return num;
})
.log()
.subscribeOn(Schedulers.boundedElastic());
}
If you run it, you would see that execution is happening on multiple threads with concurrency of Queues.SMALL_BUFFER_SIZE
22:54:26.066 [boundedElastic-50] INFO [r.M.L.50] - | request(32)
22:54:27.038 [boundedElastic-6] INFO [r.M.L.6] - | onNext(6)
22:54:27.038 [boundedElastic-11] INFO [r.M.L.11] - | onNext(11)
22:54:27.038 [boundedElastic-2] INFO [r.M.L.2] - | onNext(2)
22:54:27.038 [boundedElastic-9] INFO [r.M.L.9] - | onNext(9)
22:54:27.038 [boundedElastic-3] INFO [r.M.L.3] - | onNext(3)
22:54:27.040 [boundedElastic-13] INFO [r.M.L.13] - | onNext(13)
22:54:27.040 [boundedElastic-7] INFO [r.M.L.7] - | onNext(7)
22:54:27.040 [boundedElastic-20] INFO [r.M.L.20] - | onNext(20)
22:54:27.041 [boundedElastic-6] INFO [r.M.L.6] - | onComplete()
22:54:27.041 [boundedElastic-13] INFO [r.M.L.13] - | onComplete()
22:54:27.043 [boundedElastic-2] INFO [r.M.L.2] - | onComplete()
22:54:27.043 [boundedElastic-3] INFO [r.M.L.3] - | onComplete()
22:54:27.043 [boundedElastic-20] INFO [r.M.L.20] - | onComplete()
22:54:27.043 [boundedElastic-11] INFO [r.M.L.11] - | onComplete()
22:54:27.043 [boundedElastic-7] INFO [r.M.L.7] - | onComplete()
22:54:27.044 [boundedElastic-9] INFO [r.M.L.9] - | onComplete()
22:54:27.045 [boundedElastic-1] INFO [r.M.L.1] - | onNext(1)
22:54:27.045 [boundedElastic-5] INFO [r.M.L.5] - | onNext(5)
22:54:27.045 [boundedElastic-15] INFO [r.M.L.15] - | onNext(15)
22:54:27.045 [boundedElastic-1] INFO [r.M.L.1] - | onComplete()
22:54:27.045 [boundedElastic-15] INFO [r.M.L.15] - | onComplete()
Also, you could check Flight of the Flux 3 - Hopping Threads and Schedulers for more detailed explanation and more examples.

SearchRequest in RootDSE

I have to following function to query users from an AD server:
public List<LDAPUserDTO> getUsersWithPaging(String filter)
{
List<LDAPUserDTO> userList = new ArrayList<>();
try(LDAPConnection connection = new LDAPConnection(config.getHost(),config.getPort(),config.getUsername(),config.getPassword()))
{
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB,filter, null);
ASN1OctetString resumeCookie = null;
while (true)
{
searchRequest.setControls(
new SimplePagedResultsControl(100, resumeCookie));
SearchResult searchResult = connection.search(searchRequest);
for (SearchResultEntry e : searchResult.getSearchEntries())
{
LDAPUserDTO tmp = new LDAPUserDTO();
tmp.distinguishedName = e.getAttributeValue("distinguishedName");
tmp.name = e.getAttributeValue("name");
userList.add(tmp);
}
LDAPTestUtils.assertHasControl(searchResult,
SimplePagedResultsControl.PAGED_RESULTS_OID);
SimplePagedResultsControl responseControl =
SimplePagedResultsControl.get(searchResult);
if (responseControl.moreResultsToReturn())
{
resumeCookie = responseControl.getCookie();
}
else
{
break;
}
}
return userList;
} catch (LDAPException e) {
logger.error(e.getExceptionMessage());
return null;
}
}
However, this breaks when I try to search on the RootDSE.
What I've tried so far:
baseDN = null
baseDN = "";
baseDN = RootDSE.getRootDSE(connection).getDN()
baseDN = "RootDSE"
All resulting in various exceptions or empty results:
Caused by: LDAPSDKUsageException(message='A null object was provided where a non-null object is required (non-null index 0).
2020-04-01 10:42:22,902 ERROR [de.dbz.service.LDAPService] (default task-1272) LDAPException(resultCode=32 (no such object), numEntries=0, numReferences=0, diagnosticMessage='0000208D: NameErr: DSID-03100213, problem 2001 (NO_OBJECT), data 0, best match of:
''
', ldapSDKVersion=4.0.12, revision=aaefc59e0e6d110bf3a8e8a029adb776f6d2ce28')

So, I really spend a lot of time with this. It is possible to kind of query the RootDSE, but it's not that straight forward as someone might think.
I mainly used WireShark to see what the guys at Softerra are doing with their LDAP Browser.
Turns out I wasn't that far away:
As you can see, the baseObject is empty here.
Also, there is one additional Control with the OID LDAP_SERVER_SEARCH_OPTIONS_OID and the ASN.1 String 308400000003020102.
So what does this 308400000003020102 more readable: 30 84 00 00 00 03 02 01 02 actually do?
First of all, we decode this into something, we can read - in this case, this would be the int 2.
In binary, this gives us: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
As we know from the documentation, we have the following notation:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-------|-------|
| x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | SSFPR | SSFDS |
or we just take the int values from the documentation:
1 = SSFDS -> SERVER_SEARCH_FLAG_DOMAIN_SCOPE
2 = SSFPR -> SERVER_SEARCH_FLAG_PHANTOM_ROOT
So, in my example, we have SSFPR which is defined as follows:
For AD DS, instructs the server to search all NC replicas except
application NC replicas that are subordinate to the search base, even
if the search base is not instantiated on the server. For AD LDS, the
behavior is the same except that it also includes application NC
replicas in the search. For AD DS and AD LDS, this will cause the
search to be executed over all NC replicas (except for application NCs
on AD DS DCs) held on the DC that are subordinate to the search base.
This enables search bases such as the empty string, which would cause
the server to search all of the NC replicas (except for application
NCs on AD DS DCs) that it holds.
NC stands for Naming Context and those are stored as Operational Attribute in the RootDSE with the name namingContexts.
The other value, SSFDS does the following:
Prevents continuation references from being generated when the search
results are returned. This performs the same function as the
LDAP_SERVER_DOMAIN_SCOPE_OID control.
So, someone might ask why I even do this. As it turns out, I got a customer with several sub DCs under one DC. If I tell the search to handle referrals, the execution time is pretty high and too long - therefore this wasn't really an option for me. But when I turn it off, I wasn't getting all the results when I was defining the BaseDN to be the group whose members I wanted to retrieve.
Searching via the RootDSE option in Softerra's LDAP Browser was way faster and returned the results in less then one second.
I personally don't have any clue why this is way faster - but the ActiveDirectory without any interface of tool from Microsoft is kind of black magic for me anyway. But to be frank, that's not really my area of expertise.
In the end, I ended up with the following Java code:
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB, filter, null);
[...]
Control globalSearch = new Control("1.2.840.113556.1.4.1340", true, new ASN1OctetString(Hex.decode("308400000003020102")));
searchRequest.setControls(new SimplePagedResultsControl(100, resumeCookie, true),globalSearch);
[...]
The used Hex.decode() is the following: org.bouncycastle.util.encoders.Hex.
A huge thanks to the guys at Softerra which more or less put my journey into the abyss of the AD to an end.

You can't query users from the RootDSE.
Use either a domain or if you need to query users from across domains in a forest use the global catalog (running on different ports, not the default 389 / 636 for LDAP(s).
RootDSE only contains metadata. Probably this question should be asked elsewhere for more information but first read up on the documentation from Microsoft, e.g.:
https://learn.microsoft.com/en-us/windows/win32/ad/where-to-search
https://learn.microsoft.com/en-us/windows/win32/adschema/rootdse
E.g.: namingContexts attribute can be read to find which other contexts you may want to query for actual users.
Maybe start with this nice article as introduction:
http://cbtgeeks.com/2016/06/02/what-is-rootdse/

Understanding the output of PrintRelocations in Hotspot

I am trying to understand how a method is stored in CodeBuffer (divided into three different sections) while it is getting compiled in HotSpot VM. I have printed relocation information however I don't fully understand it. This is my output:
3918 1 1 Demo::workload (2 bytes)
// Relocation information in 3 sections of CodeBuffer
CodeBuffer:
consts.code = 0x00007fffe11f27a0 : 0x00007fffe11f27a0 : 0x00007fffe11ff470 (0 of 52432)
consts.locs = 0x00007fffac008910 : 0x00007fffac008910 : 0x00007fffac008918 (0 of 4) point=0
#0x00007fffac008910:
insts.code = 0x00007fffe11727a0 : 0x00007fffe11727f2 : 0x00007fffe11f2320 (82 of 523136)
insts.locs = 0x00007fffac0087b0 : 0x00007fffac0087b4 : 0x00007fffac008904 (2 of 170) point=77
#0x00007fffac0087b0: b416
relocInfo#0x00007fffac0087b0 [type=11(poll_return) addr=0x00007fffe11727b6 offset=22 format=1]
#0x00007fffac0087b2: 6437
relocInfo#0x00007fffac0087b2 [type=6(runtime_call) addr=0x00007fffe11727ed offset=55 format=1] | [destination=0x00007fffe116f1e0]
#0x00007fffac0087b4:
stubs.code = 0x00007fffe11f2340 : 0x00007fffe11f23f5 : 0x00007fffe11f2780 (181 of 1088)
stubs.locs = 0x00007fffb4009f90 : 0x00007fffb4009faa : 0x00007fffb4009fcc (13 of 30) point=176
#0x00007fffb4009f90: 6428
relocInfo#0x00007fffb4009f90 [type=6(runtime_call) addr=0x00007fffe11f2368 offset=40 format=1] | [destination=0x00007fffe12037e0]
#0x00007fffb4009f92: f803f6ad80097fff705b
relocInfo#0x00007fffb4009f9a [type=7(external_word) addr=0x00007fffe11f23c3 offset=91 data={f6ad80097fff}] | [target=0x00007ffff6ad8009]
#0x00007fffb4009f9c: f060800a
relocInfo#0x00007fffb4009f9e [type=8(internal_word) addr=0x00007fffe11f23cd offset=10 data=96] | [target=0x00007fffe11f236d]
#0x00007fffb4009fa0: 6411
relocInfo#0x00007fffb4009fa0 [type=6(runtime_call) addr=0x00007fffe11f23de offset=17 format=1] | [destination=0x00007ffff676dc98]
#0x00007fffb4009fa2: f801fd729006
relocInfo#0x00007fffb4009fa6 [type=9(section_word) addr=0x00007fffe11f23e4 offset=6 data=-654] | [target=0x00007fffe11f23e4]
#0x00007fffb4009fa8: 640c
relocInfo#0x00007fffb4009fa8 [type=6(runtime_call) addr=0x00007fffe11f23f0 offset=12 format=1] | [destination=0x00007fffe110f2e0]
#0x00007fffb4009faa:
So these addresses after # are addresses of each relocation entry. But what does the 4 digit hex specify after :? And what is all the info after it?

Elasticsearch - how to group by and count matches in an index

I have an instance of Elasticsearch running with thousands of documents. My index has 2 fields like this:
|____Type_____|__ Date_added __ |
| walking | 2018-11-27T00:00:00.000 |
| walking | 2018-11-26T00:00:00.000 |
| running | 2018-11-24T00:00:00.000 |
| running | 2018-11-25T00:00:00.000 |
| walking | 2018-11-27T04:00:00.000 |
I want to group by and count how many matches were found for the "type" field, in a certain range.
In SQL I would do something like this:
select type,
count(type)
from index
where date_added between '2018-11-20' and '2018-11-30'
group by type
I want to get something like this:
| type | count |
| running | 2 |
| walking | 3 |
I'm using the High Level Rest Client api in my project, so far my query looks like this, it's only filtering by the start and end time:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders
.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
How can I do a "group by" in the "type" field? Is it possible to do this in ElasticSearch?

That's a good start! Now you need to add a terms aggregation to your query:
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.boolQuery()
.must(QueryBuilders
.rangeQuery("date_added")
.from(start.getTime())
.to(end.getTime()))
)
);
// add these two lines
TermsAggregationBuilder groupBy = AggregationBuilders.terms("byType").field("type.keyword");
sourceBuilder.aggregation(groupBy);

After using Val's reply to aggregate the fields, I wanted to print the aggregations of my query together with the value of them. Here's what I did:
Terms terms = searchResponse.getAggregations().get("byType");
Collection<Terms.Bucket> buckets = (Collection<Bucket>) terms.getBuckets();
for (Bucket bucket : buckets) {
System.out.println("Type: " + bucket.getKeyAsString() + " = Count("+bucket.getDocCount()+")");
}
This is the output after running the query in an index with 2700 documents with a field called "type" and 2 different types:
Type: walking = Count(900)
Type: running = Count(1800)

Why were the messages delayed in NSQ? It's unexpected……

I used the NSQ as the mq in my project，java producer produce the message to NSQ and go consumer consume it.But the strange things is that the consumer always get the message after few seconds.There is just a few messages,I really don't know how to explain why it's happened.
Here is the test result，please pay attention to the time.They both consume the same topic.You can see in the second time,Go is slower than java for 7s.
Java result:
INFO | jvm 1 | 2018/07/11 17:22:01 | Msg
receive：{"did":"XSQ000200000005","msg":{"id":"5560","type":1,"content":"ZBINh6CBsLw7k2xjr1wslSjY+5QavEgYU6AzzLZn0lOgON9ZYHnNP4UJVUGB+/SpsxZQnrWR9PlULzpSP/p9l9t8wiAwj8qhznRaT8jeyx1/EUrDE0oXJB8GxWaLJUICCbC92j4BMA2HU8vgcfDOp9nSy1KFafi9zgFiCf9Igqo="}}
INFO | jvm 1 | 2018/07/11 17:22:11 | Msg
receive：{"did":"XSQ000200000005","msg":{"id":"5560","type":1,"content":"ZBINh6CBsLw7k2xjr1wslSjY+5QavEgYU6AzzLZn0lOgON9ZYHnNP4UJVUGB+/SpsxZQnrWR9PlULzpSP/p9l9t8wiAwj8qhznRaT8jeyx1/EUrDE0oXJB8GxWaLJUICCbC92j4BMA2HU8vgcfDOp9nSy1KFafi9zgFiCf9Igqo="}}
INFO | jvm 1 | 2018/07/11 17:23:21 | Msg
receive：{"did":"XSQ000200000005","msg":{"id":"5560","type":1,"content":"ZBINh6CBsLw7k2xjr1wslSjY+5QavEgYU6AzzLZn0lOgON9ZYHnNP4UJVUGB+/SpsxZQnrWR9PlULzpSP/p9l9t8wiAwj8qhznRaT8jeyx1/EUrDE0oXJB8GxWaLJUICCbC92j4BMA2HU8vgcfDOp9nSy1KFafi9zgFiCf9Igqo="}}
INFO | jvm 1 | 2018/07/11 17:25:31 | Msg
receive：{"did":"XSQ000200000005","msg":{"id":"5560","type":1,"content":"ZBINh6CBsLw7k2xjr1wslSjY+5QavEgYU6AzzLZn0lOgON9ZYHnNP4UJVUGB+/SpsxZQnrWR9PlULzpSP/p9l9t8wiAwj8qhznRaT8jeyx1/EUrDE0oXJB8GxWaLJUICCbC92j4BMA2HU8vgcfDOp9nSy1KFafi9zgFiCf9Igqo="}}
Go result:
2018-07-11 17:22:03 broker.go DEBUG Ready to send msg 5560 with type 1 to XSQ000200000005
2018-07-11 17:22:28 broker.go DEBUG Ready to send msg 5560 with type 1 to XSQ000200000005
2018-07-11 17:23:21 broker.go DEBUG Ready to send msg 5560 with type 1 to XSQ000200000005
2018-07-11 17:25:38 broker.go DEBUG Ready to send msg 5560 with type 1 to XSQ000200000005
please ignore the other errors,just because of the business.
Here is my go consumer:
func (b *Broker) createConsumer(topic string, vendor int32) error {
config := nsq.NewConfig()
laddr := "127.0.0.1"
// so that the test can simulate binding consumer to specified address
config.LocalAddr, _ = net.ResolveTCPAddr("tcp", laddr+":0")
// so that the test can simulate reaching max requeues and a call to LogFailedMessage
config.DefaultRequeueDelay = 0
// so that the test wont timeout from backing off
config.MaxBackoffDuration = time.Millisecond * 50
c, err := nsq.NewConsumer(topic, "channel_box_" + util.String(vendor), config)
if err != nil {
return log.Error("Failed to new nsq consumers.")
}
c.AddConcurrentHandlers(nsq.HandlerFunc(func(message *nsq.Message) error {
if err := b.handle(message, vendor); err != nil {
log.Errorf("Handle message %v for vendor %d from mq failed.", message.ID, vendor)
}
return nil
}), 5)
if err = c.ConnectToNSQLookupds(b.Opts.Nsq.Lookup); err != nil {
return log.Error("Failed to connect to nsq lookup server.")
}
b.consumers = append(b.consumers, c)
return nil
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Flux - parallel flatMap with webclient - limit to fixed batched rate - java

Related

Why reactor does not process each element concurrently?

SearchRequest in RootDSE

Understanding the output of PrintRelocations in Hotspot

Elasticsearch - how to group by and count matches in an index

Why were the messages delayed in NSQ? It's unexpected……

Categories

Resources