Skip a same combination in Stream API - java

I have a List filteredList where and I am streaming over each element and using for each to set some items
filteredList.parallelStream().forEach(s->{
ARChaic option=new ARChaic();
option.setCpu(s.getNoOfCPU());
option.setMem(s.getMemory());
option.setStorage(s.getStorage());
option.setOperatingSystem(s.getOperationSystem());
ARChaic newOption= providerDes.getLatest(option); //this is a external service
s.setCloudMemory(newOption.getMem());
s.setCloudCPU(newOption.getCpu());
s.setCloudStorage(newOption.getStorage());
s.setCloudOS(newOption.getOperatingSystem());
});
The goal is to call this service but if the above option is same then take the old one to call.
For Example- if two server have same memory,cpu,os and storage then it will call getLatest only once.
Suppose at position 1 and 7 in filteredList I have same config then I shouldn't be calling getLatest again at 7 since I already have previous option value which I will set it 7(Working done after service call)

You can add equals and hashcode to your Server class to denote when two Server instances are equal. From your description, you will have to take into account and compare the memory, cpu, os, and storage.
After this, you can map the filteredList as a Map<Server, List<Server>> to get unique servers as the key and the value will have all the repeated server instances. You will call the service once for each key in the map, but after you get the result, you can update all the server instances that are the value of the map with the result.
Map<Server, List<Server>> uniqueServers = filteredList.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.mapping(Function.identity(),
Collectors.toList())));
uniqueServers.entrySet().parallelStream().forEach(entry -> {
Server currentServer = entry.getKey(); //Current server
ARChaic option=new ARChaic();
option.setCpu(currentServer.getNoOfCPU());
option.setMem(currentServer.getMemory());
option.setStorage(currentServer.getStorage());
option.setOperatingSystem(currentServer.getOperationSystem());
ARChaic newOption= providerDes.getLatest(option); //this is a external service
//update all servers with the result.
entry.getValue().forEach(server -> {
server.setCloudMemory(newOption.getMem());
server.setCloudCPU(newOption.getCpu());
server.setCloudStorage(newOption.getStorage());
server.setCloudOS(newOption.getOperatingSystem());
});
});

Related

How to Set BucketLifeCycleRule in Minio?

After creating a Minio bucket, I set the bucket's lifecycle rules. The LifeCycleRule takes up the expiration variable that is set for just 1 day. When checking the status of my bucket through minio client (mc), mc ilm ls mycloud/bucketName , I notice that the Lifecycle rule was successfully applied on to the designated bucket. However, when checking back on Minio after 1 day, the bucket is still there. Is there something else that I need to add to the LifeCycleRule in order to delete Minio Bucket properly?
Note, I've been using Minio SDKs Java Client API as reference.
fun createBucket(bucketName: String){
client.makeBucket(MakeBucketArgs.builder().bucket(bucketName).build())
setBucketLifeCycle(bucketName)
}
private fun setBucketLifeCycle(bucketName: String){
// Setting the expiration for one day.
val expiration = Expiration(null as ZonedDateTime?, 1, null)
var lifeCycleRuleList = mutableListOf<LifecycleRule>()
val lifecycleRuleExpiry = LifecycleRule(
Status.ENABLED,
null,
expiration,
RuleFilter("expiry/logs"),
"rule 1",
null,
null,
null)
lifecycleRuleList.add(lifecycleRuleExpiry)
var lifecycleConfig = LifecycleConfiguration(lifecycleRuleList)
// Applies the lifecycleConfig on to target bucket.
client.setBucketLifecycle(SetBucketLifecycleArgs.buider()
.bucket(bucketName).config(lifecycleConfig).build())
}
Questions
Am I missing something more on my LifeCycleRule?
Could it be that the bucket does not get automatically deleted because it has objects inside of it?
I did notice on the minio client that when the bucket has items on it, mc rb mycloud/bucketName will fail to remove the the bucket, but forcing it with mc rb -force mycloud/bucketName will successfully remove it. Is there a way to speficy "force" on the lifecycle parameters?
Lifecycle rules apply to objects within a bucket, not to the bucket itself.
An S3 Lifecycle configuration is a set of rules that define actions that Amazon S3 applies to a group of objects.
(ref: https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
So, this bucket will not be deleted (even) when all the objects in it are expired via ILM policies.

Java Reactive. How to wait for all data in flux and then process them

I'm getting data from mongo reactive repository and updating it. Then I have to collect all data in one collection and path it to another service to get more info. Then I should map them in one Flux. My code is:
Flux<Views> views = someRepository.findAllByUsersIn(userId).doOnNext(v -> {
v.setInterlocutor(v.getUsers().stream().filter(u -> !userId.equals(u)).findFirst().orElse(null));
});
return Flux.zip(views.map(view -> conversionService.convert(view, ResponseViewDto.class)), getUserInfo(views).flux())
.flatMap(fZip -> {
ResponseViewDto dto = fZip.getT1();
dto.setInterlocutor(fZip.getT2().get(dto.getInter()));
return Flux.just(dto);
});
getUserInfo does collecting usersId and sends them to another service and returns expanded info.
I found that getting from DB calls 2 times and I can understand why, but is there any solution to do it once and still be not blocking.
Thanks to Adhika Setya Pramudita for help. The way to do what I need is just to use cache() method

Applying a Single to an ObservableSource and not over-reading

I'm pretty new to RX in general, and rxjava in particular, pardon mistakes.
This operation depends on a two async operations.
The first uses a filter function to attempt to get a single entity from a list returned by an async Observable.
The second is an async operation that communicates with a device and produces an Observable of status updates.
I want to take the Single that is created from the filter function, apply that to pairReader(...), and subscribe to its Observable for updates. I can get this to work as shown, but only if I include the take(1) commented, otherwise I get an exception because the chain tries to pull another value from the Single.
Observable<DeviceCredential> getCredentials() {
return deviceCredentialService()
.getCredentials()
.flatMapIterable(event -> event.getData());
}
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
// A device is logically constrained to only have a single cred per org
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.take(1) // Without this I get an exception
.singleOrError();
}
Function<Organization, Observable<Reader.EnrollmentState>> pairReader(String name) {
return org -> readerService().pair(name, org);
}
getOrgFromCreds(orgid)
.flatMapObservable(pairReader(readerid))
.subscribe(state -> {
switch(state) {
case BEGUN:
LOG.d(TAG, "Pairing begun");
break;
case PAIRED:
LOG.d(TAG, "Pairing success");
callback.success();
break;
case NOTIFIED_SERVER:
LOG.d(TAG, "Pairing server notified");
break;
}},
error -> {
Crashlytics.logException(error);
callback.error(error.getLocalizedMessage());
});
If the source stream emits more than one item, singleOrError() is supposed to emit an error. Doc
For your case, use either first() or firstOrError() instead.
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.firstOrError();
}
If I got you right, you need to make some action using previously retrieved async data. So, you could use .zip() operator.
Here is an example:
Observable.zip(
getOrgFromCreds().toObservable(),
getCredentials(),
(first, second) -> /*create output object here*/
)
.subscribe(
(n) -> /*do onNext*/,
(e) -> /*do onError*/
);
Note, that .zip() operator will wait for both emission from two streams, and then it will create outer emission using the function you provided in "create output object here".
If you don't want to wait for both items - you can use .combineLatest().
The problem here turned out to be that the API was designed in an odd way (and unfortunately has extremely poor documentation). I couldn't figure out why I was getting duplicates, and thought I was using flatMapIterable incorrectly.
What the deviceCredentialService.getCredentials() call actually creates is an observable that emits DataEvent objects which are simple wrappers over a list of results, and with a flag of where the results came from.
The API designer wanted to allow the user to use locally cached data to fill the UI immediately while a longer request to a REST API executes. The DataEvent.from property is an enum that flags the source, either from the local device cache or from the remote API call.
The way I solved this was to simply ignore the results coming from local cache and only emit results from the API:
Observable<DeviceCredential> getCredentials() {
return deviceCredentialService()
.getCredentials()
// Only get creds from network
.filter(e -> e.getFrom() == SyncedDataSourceObservableFactory.From.SOURCE)
.flatMapIterable(e -> e.getData());
}
Single<Organization> getOrgFromCreds(String orgid) {
return getCredentials()
// A device is logically constrained to only have a single cred per org
.map(DeviceCredential::getOrganization)
.filter(org -> org.getId().equals(orgid))
.singleOrError();
}
The plan then is to use memoization to cache entities in a way that gives the implementing app access to cache invalidation. Since the provided interface doesn't allow squelching the API call, there is no way to work only with cache if the app feels its is fresh.

Can Spark Streaming do Anything Other Than Word Count?

I'm trying to get to grips with Spark Streaming but I'm having difficulty. Despite reading the documentation and analysing the examples I wish to do something more than a word count on a text file/stream/Kafka queue which is the only thing we're allowed to understand from the docs.
I wish to listen to an incoming Kafka message stream, group messages by key and then process them. The code below is a simplified version of the process; get the stream of messages from Kafka, reduce by key to group messages by message key then to process them.
JavaPairDStream<String, byte[]> groupByKeyList = kafkaStream.reduceByKey((bytes, bytes2) -> bytes);
groupByKeyList.foreachRDD(rdd -> {
List<MyThing> myThingsList = new ArrayList<>();
MyCalculationCode myCalc = new MyCalculationCode();
rdd.foreachPartition(partition -> {
while (partition.hasNext()) {
Tuple2<String, byte[]> keyAndMessage = partition.next();
MyThing aSingleMyThing = MyThing.parseFrom(keyAndMessage._2); //parse from protobuffer format
myThingsList.add(aSingleMyThing);
}
});
List<MyResult> results = myCalc.doTheStuff(myThingsList);
//other code here to write results to file
});
When debugging I see that in the while (partition.hasNext()) the myThingsList has a different memory address than the declared List<MyThing> myThingsList in the outer forEachRDD.
When List<MyResult> results = myCalc.doTheStuff(myThingsList); is called there are no results because the myThingsList is a different instance of the List.
I'd like a solution to this problem but would prefer a reference to documentation to help me understand why this is not working (as anticipated) and how I can solve it for myself (I don't mean a link to the single page of Spark documentation but also section/paragraph or preferably still, a link to 'JavaDoc' that does not provide Scala examples with non-functional commented code).
The reason you're seeing different list addresses is because Spark doesn't execute foreachPartition locally on the driver, it has to serialize the function and send it over the Executor handling the processing of the partition. You have to remember that although working with the code feels like everything runs in a single location, the calculation is actually distributed.
The first problem I see with you code has to do with your reduceByKey which takes two byte arrays and returns the first, is that really what you want to do? That means you're effectively dropping parts of the data, perhaps you're looking for combineByKey which will allow you to return a JavaPairDStream<String, List<byte[]>.
Regarding parsing of your protobuf, looks to me like you don't want foreachRDD, you need an additional map to parse the data:
kafkaStream
.combineByKey(/* implement logic */)
.flatMap(x -> x._2)
.map(proto -> MyThing.parseFrom(proto))
.map(myThing -> myCalc.doStuff(myThing))
.foreachRDD(/* After all the processing, do stuff with result */)

Combination of RxJava and RxAndroid?

My Scenario is very similar to this Image:
Flow of the app will be like this:
View needs to get updated.
Create an observable using RxAndroid to fetch the data from cache / local file.
update the view.
Make another network call using Retrofit and RxJava to update the view again with new data coming from the web services.
Update the local file with the new data.
So, I am updating the view twice(One from local file and just after that through webservices)
How can I achieve the result using RxJava and RxAndroid? What I was thinking is
Create an observable1 to get the data from local file system.
In the onNext method of observable1 I can create another observable2.
observable2.onNext() I can update the local file.
Now How will I update the view with the updated data (loaded in the file)?
What would be the good approach?
I wrote a blog post about exactly this same scenario. I used the merge operator (as suggested by sockeqwe) to address your points '2' and '4' in parallel, and doOnNext to address '5':
// NetworkRepository.java
public Observable<Data> getData() {
// implementation
}
// DiskRepository.java
public Observable<Data> getData() {
// implementation
}
// DiskRepository.java
public void saveData(Data data) {
// implementation
}
// DomainService.java
public Observable<Data> getMergedData() {
return Observable.merge(
diskRepository.getData().subscribeOn(Schedulers.io()),
networkRepository.getData()
.doOnNext(new Action1<Data>() {
#Override
public void call(Data data) {
diskRepository.saveData(data); // <-- save to cache
}
}).subscribeOn(Schedulers.io())
);
}
In my blog post I additionally used filter and Timestamp to skip updating the UI if the data is the same or if cache is empty (you didn't specify this but you will likely run into this issue as well).
Link to the post: https://medium.com/#murki/chaining-multiple-sources-with-rxjava-20eb6850e5d9
Looks like concat or merge operator is what you are looking for
The difference between them is:
Merge may interleave the items emitted by the merged Observables (a similar operator, Concat, does not interleave items, but emits all of each source Observable’s items in turn before beginning to emit items from the next source Observable).
Observable retroFitBackendObservable = retrofitBackend.getFoo().doOnNext(save_it_into_local_file );
Observable mergedObservable = Observable.merge(cacheObservable, retroFitBackendObservable);
mergedObservable.subscribe( ... );
Then subscribe for mergedObservable and update your view in onNext()

Categories