Java 8 Stream.findAny() vs finding a random element in the stream

Java 8 Stream.findAny() vs finding a random element in the stream - java

In my Spring application, I have a Couchbase repository for a document type of QuoteOfTheDay. The document is very basic, just has an id field of type UUID, value field of type String and created date field of type Date.
In my service class, I have a method that returns a random quote of the day. Initially I tried simply doing the following, which returned an argument of type Optional<QuoteOfTheDay>, but it would seem that findAny() would pretty much always return the same element in the stream. There's only about 10 elements at the moment.
public Optional<QuoteOfTheDay> random() {
return StreamSupport.stream(repository.findAll().spliterator(), false).findAny();
}
Since I wanted something more random, I implemented the following which just returns a QuoteOfTheDay.
public QuoteOfTheDay random() {
int count = Long.valueOf(repository.count()).intValue();
if(count > 0) {
Random r = new Random();
List<QuoteOfTheDay> quotes = StreamSupport.stream(repository.findAll().spliterator(), false)
.collect(toList());
return quotes.get(r.nextInt(count));
} else {
throw new IllegalStateException("No quotes found.");
}
}
I'm just curious how the findAny() method of Stream actually works since it doesn't seem to be random.
Thanks.

The reason behind findAny() is to give a more flexible alternative to findFirst(). If you are not interested in getting a specific element, this gives the implementing stream more flexibility in case it is a parallel stream.
No effort will be made to randomize the element returned, it just doesn't give the same guarantees as findFirst(), and might therefore be faster.
This is what the Javadoc says on the subject:
The behavior of this operation is explicitly nondeterministic; it is free to select any element in the stream. This is to allow for maximal performance in parallel operations; the cost is that multiple invocations on the same source may not return the same result. (If a stable result is desired, use findFirst() instead.)

Don’t collect into a List when all you want is a single item. Just pick one item from the stream. By picking the item via Stream operations you can even handle counts bigger than Integer.MAX_VALUE and don’t need the “interesting” way of hiding the fact that you are casting a long to an int (that Long.valueOf(repository.count()).intValue() thing).
public Optional<QuoteOfTheDay> random() {
long count = repository.count();
if(count==0) return Optional.empty();
Random r = new Random();
long randomIndex=count<=Integer.MAX_VALUE? r.nextInt((int)count):
r.longs(1, 0, count).findFirst().orElseThrow(AssertionError::new);
return StreamSupport.stream(repository.findAll().spliterator(), false)
.skip(randomIndex).findFirst();
}

Related

How to Sum up the attribute values of objects in a list having particular IDs and assign it to another object using Streams

My classes.
class MyLoan {
private Long loanId;
private BigDecimal loanAmount;
private BigDecimal totalPaid;
....
}
class Customer {
private Long loanId;
private List<MyLoan> myLoan;
}
I want to iterate over the myLoan from a Customer and calculate the totalPaid amount.
My logic is "If loanId is 23491L or 23492L, then add the loanAmount of those two loanId's and set the value in the totalPaid amount of loanId 23490L".totalPaid amount is always showing as zero with my logic below.
And want to use Java 8 streams, but unable to write multiple conditions when using streams.
BigDecimal spreadAmount;
for (MyLoan myloan: customer.getMyLoan()) {
if (myloan.getLoanId() == 23491L || myloan.getLoanId() == 23492L) {
spreadAmount = spreadAmount.add(myloan.getLoanAmount());
}
if (myloan.getLoanId() == 23490L) {
myloan.setTotalPaid(spreadAmount);
}
}

The totalPaid field is not modified because your MyLoan instance with id 23490l is encountered before the other two MyLoans.
As #Silvio Mayolo has suggested in the comments you should first compute the total amount with a temp variable and then assign it to the totalPaid field of the MyLoan instance with id 23490l.
This is a stream implementation of what you were trying to do:
//If to make sure that the element MyLoan invoking the setter is actually present
if (myLoan.stream().map(MyLoan::getLoanId).anyMatch(value -> value == 23490l)){
myLoan.stream()
.filter(loan -> loan.getLoanId() == 23490l)
.findFirst()
.get()
.setTotalPaid(myLoan.stream()
.filter(loan -> loan.getLoanId() == 23491l || loan.getLoanId() == 23492l)
.map(MyLoan::getLoanAmount)
.reduce(BigDecimal.valueOf(0), (a, b) -> a = a.add(b)));
}
WARNING
The method get(), invoked on the Optional retrieved with the terminal operation findFirst(), could throw a NoSuchElementException if a MyLoan with id 23490l is not present within the list. You should first make sure that the element is present, as I've done with my if statement.
A second (bad practice) could involve catching the NoSuchElementException thrown by the get(), in case the desired MyLoan is not present. As it has been pointed out in the comments, catching a RuntimeException (NoSuchElementException is a subclass of it) is a bad practice, as we should investigate on the origin of the problem rather than simply catching the exception. This second approach was honestly a (lazy) last resort only to show another possible way of handling the case.

Firstly, you need to fetch a loan for which you want to define a total paid amount. If this step succeeds, then calculate a total.
In order to find a loan with a particular id using streams, you need to create a stream over the customers loans and apply filter() in conjunction with findFirst() on it. It'll give you the first element from the stream that matches the predicate passed into the filter. Because result might not be present in the stream, findFirst() returns an Optional object.
Optional class offers a wide range of method to interact with it like orElse(), ifPresent(), orElse(), etc. Avoid blindly using get(), unless you didn't check that value is present, which is in many cases isn't the most convenient way to deal with it. Like in the code below, ifPresent() is being used to proceed with the logic if value is present.
So if the required loan was found, the next step is to calculate the total. Which is done by filtering out target ids, extracting amount by applying map() and adding the amounts together using reduce() as a terminal operation.
public static void setTotalPaid(Customer customer, Long idToSet, Long... idsToSumUp) {
List<MyLoan> loans = customer.getMyLoan();
getLoanById(loans, idToSet).ifPresent(loan -> loan.setTotalPaid(getTotalPaid(loans, idsToSumUp)));
}
public static Optional<MyLoan> getLoanById(List<MyLoan> loans, Long id) {
return loans.stream()
.filter(loan -> loan.getLoanId().equals(id))
.findFirst();
}
public static BigDecimal getTotalPaid(List<MyLoan> loans, Long... ids) {
Set<Long> targetLoans = Set.of(ids); // wrapping with set to improve performance
return loans.stream()
.filter(loan -> targetLoans.contains(loan.getLoanId()))
.map(MyLoan::getLoanAmount)
.reduce(BigDecimal.ZERO, BigDecimal::add);
}

ConcurrentHashMap throws recursive update exception

Here is my Java code:
static Map<BigInteger, Integer> cache = new ConcurrentHashMap<>();
static Integer minFinder(BigInteger num) {
if (num.equals(BigInteger.ONE)) {
return 0;
}
if (num.mod(BigInteger.valueOf(2)).equals(BigInteger.ZERO)) {
//focus on stuff thats happening inside this block, since with given inputs it won't reach last return
return 1 + cache.computeIfAbsent(num.divide(BigInteger.valueOf(2)),
n -> minFinder(n));
}
return 1 + Math.min(cache.computeIfAbsent(num.subtract(BigInteger.ONE), n -> minFinder(n)),
cache.computeIfAbsent(num.add(BigInteger.ONE), n -> minFinder(n)));
}
I tried to memoize a function that returns a minimum number of actions such as division by 2, subtract by one or add one.
The problem I'm facing is when I call it with smaller inputs such as:
minFinder(new BigInteger("32"))
it works, but with bigger values like:
minFinder(new BigInteger("64"))
It throws a Recursive Update exception.
Is there any way to increase recursion size to prevent this exception or any other way to solve this?

From the API docs of Map.computeIfAbsent():
The mapping function should not modify this map during computation.
The API docs of ConcurrentHashMap.computeIfAbsent() make that stronger:
The mapping function must not modify this map during computation.
(Emphasis added)
You are violating that by using your minFinder() method as the mapping function. That it seems nevertheless to work for certain inputs is irrelevant. You need to find a different way to achieve what you're after.
Is there any way to increase recursion size to prevent this exception or any other way to solve this?
You could avoid computeIfAbsent() and instead do the same thing the old-school way:
BigInteger halfNum = num.divide(BigInteger.valueOf(2));
BigInteger cachedValue = cache.get(halfNum);
if (cachedValue == null) {
cachedValue = minFinder(halfNum);
cache.put(halfNum, cachedValue);
}
return 1 + cachedValue;
But that's not going to be sufficient if the computation loops. You could perhaps detect that by putting a sentinel value into the map before you recurse, so that you can recognize loops.

Adding an Immutable Array Slows Down a Thread

I have encountered a bit of a paradox that I am trying to understand. Basically I have two variants of an object in a threaded setting - the variants only differ in that one has an immutable array of immutable objects of fixed length, and yet this second variant is considerable slower than the first. Here is the set up:
final class Object {
public Pair<Long, ImmutableThing> cache,
public ImmutableThing getThing(long timestamp) {
if (timestamp > cache.getKey()) {
ImmutableThing newThing = doExpensiveComputation(timestamp);
cache = new Pair(newThing.getLong(), newThing);
return newThing; }
else { return cache.getValue()}
This first version shows much better performance for the getThing method: It looks up the cache, if the data is valid it returns it, otherwise does a fairly expensive computation, updates the cache, and returns the new value. I understand this is not thread safe as written, but here is the second variant:
final class SlowerObject {
public Pair<Long, ImmutableThing> cache;
public final ArrayList[ImmutableThing] timelineOfThings;
public ImmutableThing getThing(long timestamp) {
if (timestamp > cache.getKey()) {
ImmutableThing newThing = findInTimelineOfThings(timestamp);
cache = new Pair(newThing.getLong(), newThing);
return newThing; }
else { return cache.getValue()}
In this second variant, we pre-compute an array which stores all the possible values of the things we want to return from getThing (there are only 4 possibilities in my case). Instead of doing a computation if the cache is invalid, we just lookup in the array until we find the correct one, and the computation to figure out which is correct is nearly instant - just comparing long values. The array is never rewritten, just read.
This is all occurring in a threaded environment. Why should the second one be slower?

Why isn't it an error?

The following program is a recursive program to find the maximum and minimum of an array.(I think! Please tell me if it is not a valid recursive program. Though there are easier ways to find the maximum and minimum in the array, I'm doing in the recursive manner only as a part of a exercise!)
This program works correctly and produces the outputs as expected.
In the comment line where I have marked "Doubt here!", I am unable to understand why an error is not given during compilation. The return type is clearly an integer array (as specified in the method definition), but I have not assigned the returned data to any integer array, but the program still works. I was expecting an error during compilation if I did it this way, but it worked. If someone would help me figure this out, it'd be helpful! :)
import java.io.*;
class MaxMin_Recursive
{
static int i=0,max=-999,min=999;
public static void main(String[] args) throws IOException
{
BufferedReader B = new BufferedReader(new InputStreamReader(System.in));
int[] inp = new int[6];
System.out.println("Enter a maximum of 6 numbers..");
for(int i=0;i<6;i++)
inp[i] = Integer.parseInt(B.readLine());
int[] numbers_displayed = new int[2];
numbers_displayed = RecursiveMinMax(inp);
System.out.println("The maximum of all numbers is "+numbers_displayed[0]);
System.out.println("The minimum of all numbers is "+numbers_displayed[1]);
}
static int[] RecursiveMinMax(int[] inp_arr) //remember to specify that the return type is an integer array
{
int[] retArray = new int[2];
if(i<inp_arr.length)
{
if(max<inp_arr[i])
max = inp_arr[i];
if(min>inp_arr[i])
min = inp_arr[i];
i++;
RecursiveMinMax(inp_arr); //Doubt here!
}
retArray[0] = max;
retArray[1] = min;
return retArray;
}
}

The return type is clearly an integer array (as specified in the method definition), but I have not assigned the returned data to any integer array, but the program still works.
Yes, because it's simply not an error to ignore the return value of a method. Not as far as the compiler is concerned. It may well represent a bug, but it's a perfectly valid use of the language.
For example:
Console.ReadLine(); // User input ignored!
"text".Substring(10); // Result ignored!
Sometimes I wish it could be used as warning - and indeed Resharper will give warnings when it can detect that "pure" methods (those without any side-effects) are called without using the return value. In particular, call which cause problems in real life:
Methods on string such as Replace and Substring, where users assume that calling the method alters the existing string
Stream.Read, where users assume that all the data they've requested has been read, when actually they should use the return value to see how many bytes have actually been read
There are times where it's entirely appropriate to ignore the return value for a method, even when it normally isn't for that method. For example:
TValue GetValueOrDefault<TKey, TValue>(Dictionary<TKey, TValue> dictionary, TKey key)
{
TValue value;
dictionary.TryGetValue(key, out value);
return value;
}
Normally when you call TryGetValue you want to know whether the key was found or not - but in this case value will be set to default(TValue) even if the key wasn't found, so we're going to return the same thing anyway.

In Java (as in C and C++) it is perfectly legal to discard the return value of a function. The compiler is not obliged to give any diagnostic.

result of semantic search on java lucene

i've implemented the Latent Semantic Analisys on Lucene
The result of the algorithm are the matrix of 2 columns where the first is the index of the document and the second similarity.
That i want to write the response in the org.apache.lucene.search.Collector to the method search of Searcher, but i do not know how set the result in the collector object.
the code for the search method is:
public void search(Weight weight, Filter filter, Collector collector) throws IOException
{
String textQuery = weight.getQuery().toString("contents");
System.out.println(textQuery);
double[][] ind;
ind = lsa.searchOnDoc(textQuery);
//ind contains the index and the similarity
if (ind != null)
{
//construct the collector object
for (int i=0; i<ind.length; i++)
{
int doc =(int) ind[i][0];
double simi = ind[i][1]
//collector.collect(doc);
//collector.setScorer(sim]);
//This is the problem
}
}
else
{
collector = null;
}
}
i don't know the right steps to copy the value of ind in the collector object.
Can you help me?

I don't quite get why did you decide to shove LSI into Searcher.
And getting your text query from Weight looks especially shady - why not use the original query instead and skip all the (broken) conversions?
But the Collector is handled as follows.
For each segment in your index:
Supply it corresponding SegmentReader with collector.setNextReader(reader, base). You can get these with ir.getSequentialSubReaders() and ir.getSubReaderStarts() on toplevel reader. So,
reader may be used by collector to load sort fields/caches during collection, and additional fields to augment search result when collection is done,
base is the number added to segment/local docIDs (they start from 0 for each segment) to convert them to index/global docIDs.
Supply it a Scorer implementation with collector.setScorer(scorer).
collector may use it during the next phase to get the score for the documents. Though if collector only counts the results, or sorts on some stored field, or just feels so - scorer will be ignored.
The only method collectors invoke on Scorer instance is scorer.score(), which should return the score (I kid you not) for the current document being collected.
Repeatedly call collector.collect(id) with monotonically increasing sequence of segment/local docIDs that match your query.
Going back to your code - make some wrapper that implements Scorer, use a single instance with a field that you update with simi on each iteration, have wrapper's score() method return that field, shove this instance into collector with setScorer() before the loop.
You also need lsa.searchOnDoc to return per-segment results.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java 8 Stream.findAny() vs finding a random element in the stream - java

Related

How to Sum up the attribute values of objects in a list having particular IDs and assign it to another object using Streams

ConcurrentHashMap throws recursive update exception

Adding an Immutable Array Slows Down a Thread

Why isn't it an error?

result of semantic search on java lucene

Categories

Resources