Performance - Iterating in Java vs Querying the database

Performance - Iterating in Java vs Querying the database - java

It's difficult for me to think how to ask this so I will create an example to demonstrate what I am asking for:
Suppose I have my model:
public class UserEvaluation {
String name;
Date respondedAt;
}
public class Evaluation {
String name;
List<UserEvaluation> userEvaluations;
{
And then in my EvaluationService I need to know the amount of userEvaluations which have been responded (respondedAt != null).
Possible solutions:
1 By iterating through all the items:
Evaluation evaluation = evaluationRepository.get(1);
Long count = 0;
for(UserEvaluation userEvaluation : evaluation.getUserEvaluations()) {
if(userEvaluation.getRespondedAt() != null) {
count++;
}
}
2 By Lambda Expressions:
Evaluation evaluation = evaluationRepository.get(1);
Long count = evaluation.getUserEvaluations().stream()
.filter(ue -> ue.getRespondedAt() != null)
.count();
3 By querying de database:
Evaluation evaluation = evaluationRepository.get(1);
Long count = userEvaluationRepository.getRespondedCountByEvaluation(evaluation); //And implement this simple count query.
So this is the simpliest thing. Which I would pick? I'm using a lot of iterators and lambda stream iterator expressions in my app. But I am worried about that should be an error and I would need to interact more with the database? Should I? Should I not?

If you are not sure I would design it so you can change it as required. Have a DAO implementation which accesses the database, but one which could be changed to use in memory data if you determine using the database is not fast enough.
interface DatabaseDAO {
long getRespondedCountByEvaluation(Evaluation e);
}
It depends on which approach would retrieve less data. If you grab the data once and extract information from that many times it can be faster, but if you are only getting a small portion each time and have an indexed database it can be much faster. I would design it

Related

Better method for avoiding null in nested data with Java 7

I have to analyze a huge data stream which often includes incomplete data. Currently the code is littered with null checks at multiple levels, as there could be incomplete data at any level.
So for example I might have to retrieve:
Model.getDestination().getDevice().getName()
I tried to create a method to try and reduce the null checks to a single method whereby I enter:
IsValid(Model.getDestination(), Model.getDestination().getDevice(), Model.getDestination().getDevice().getName())
this method fails because it evaluates all parameters before it sends them, rather than checking each at a time like
Model.getDestination() != null && Model.getDestination().getDevice() != null && etc
but is there a way I could pass in Model.getDestination().getDevice().getName() and do the check at each level without having to evaluate it or split it up before I pass it?
What I really want it to do is if there is a null/nullexception it should quietly return "", and continue processing incoming data
I know there are ways to do this elegantly in Java 8, but I am stuck with Java 7

I struggled with a similar problem with deeply nested structures, and if I'd have had the opportunity to introduce additional structures just to navigate the underlying data, I think, I had done that.
This was C# which in the meantime has a save navigation/Elvis operator, for which we'll wait in vain with Java (proposed for Java 7 but discarded. Groovy has it btw.). Also looks like there are arguments against using Elvis, even if you have it). Also lambdas (and extension methods) didn't improve things really. Also every other approach has been discredited as ugly in other posts here.
Therefore I propose a secondary structure purely for navigation, each element with a getValue() method to access the original structure (also the shortcuts proposed by #Michael are straight forward to add this way). Allowing you null save navigation like this:
Model model = new Model(new Destination(null));
Destination destination = model.getDestination().getValue(); // destination is not null
Device device = model.getDestination().getDevice().getValue(); // device will be null, no NPE
String name = destination.getDevice().getName().getValue(); // name will be null, no NPE
NavDevice navDevice = model.getDestination().getDevice(); // returns an ever non-null NavDevice, not a Device
String name = navDevice.getValue().getName(); // cause an NPE by circumventing the navigation structure
With straight forward original structures
class Destination {
private final Device device;
public Destination(Device device) {
this.device = device;
}
public Device getDevice() {
return device;
}
}
class Device {
private final String name;
private Device(String name) {
this.name = name;
}
public String getName() {
return name;
}
}
And secondary structures for the purpose of save navigation.
Obviously this is debatable, since you always can access the original structure directly and run into a NPE. But in terms of readability perhaps I'd still take this, especially for large structures where a shrub of ifs or optionals really is an eyesore (which matters, if you have to tell, which business rules actually were implemented here).
A memory/speed argument could be countered by using only one navigation object per type and re-set their internals to approriate underlying objects as you navigate.
class Model {
private final Destination destination;
private Model(Destination destination) {
this.destination = destination;
}
public NavDestination getDestination() {
return new NavDestination(destination);
}
}
class NavDestination {
private final Destination value;
private NavDestination(Destination value) {
this.value = value;
}
public Destination getValue() {
return value;
}
public NavDevice getDevice() {
return new NavDevice(value == null ? null : value.getDevice());
}
}
class NavDevice {
private final Device value;
private NavDevice(Device value) {
this.value = value;
}
public Device getValue() {
return value;
}
public NavName getName() {
return new NavName(value == null ? null : value.getName());
}
}
class NavName {
private final String value;
private NavName(String value) {
this.value = value;
}
public String getValue() {
return value;
}
}

Option 1 - if statement
You already provided it in your question. I think using am if statementlike the following is perfectly acceptable:
Model.getDestination() != null && Model.getDestination().getDevice() != null && etc
Option 2 - javax Validation and checking the result - before sending
You could make use of javax validation.
See: https://www.baeldung.com/javax-validation
You would annotate the fields that you want with #NotNull.
Then you could use programmatic validation.
You could check the validation result to see if there is a problem.
Example:
So in your class you would do:
#NotNull
Public String Destination;
And you could feed your object to the validater:
ValidatorFactory factory = Validation.buildDefaultValidatorFactory();
Validator validator = factory.getValidator();
Set<ConstraintViolation<Model>> violations = validator.validate(Model);
for (ConstraintViolation<User> violation : violations) {
log.error(violation.getMessage());
}
Option 3 - fromNullable and Maps ( if you have Java 8)
I'm taking this one from https://softwareengineering.stackexchange.com/questions/255503/null-checking-whilst-navigating-object-hierarchies . This is very simular to your question.
import java.util.Optional;
Optional.fromNullable(model)
.map(Model::getDestination)
.map(Lounge::getDevice)
.ifPresent(letter -> .... do what you want ...);
Option 4 - Just using a try/catch
Everyone hates this one due to the slowness of exception.

So you want to simplify Model.getDestination().getDevice().getName(). First, I want to list a few things that should not be done: Don't use exceptions. Don't write an IsValid method, because it just doesn't work, because all functions (or methods) are strict in Java: that means that every time you call a function, all arguments are evaluated before they are passed to the function.
In Swift I would just write let name = Model.getDestination()?.getDevice()?.getName() ?? "". In Haskell it would be like name <- (destination >>= getDevice >>= getName) <|> Just "" (assuming the Maybe monad). And this has different semantics from this Java code:
if(Model.getDestination() && Model.getDestination().getDevice() && Model.getDestination().getDevice().getName() {
String name = Model.getDestination().getDevice().getName();
System.out.println("We got a name: "+name);
}
because this snippet calls getDestination() 4 times, getDevice() 3 times, getName() 2 times. This has more than just performance implications: 1) It introduces race conditions. 2) If any of the methods have side-effects, you don't want them to be called multiple times. 3) It makes everything harder to debug.
The only correct way of doing it is something like this:
Destination dest = Model.getDestination();
Device device = null;
String name = null;
if(dest != null) {
device = dest.getDevice();
if(device != null) {
name = device.getName();
}
}
if(name == null) {
name = "";
}
This code sets name to Model.getDestination().getDevice().getName(), or if any of these method calls return null, it sets name to "". I think correctness is more important than readability, especially for production applications (and even for example code IMHO). The above Swift or Haskell code is equivalent to that Java code.
If you have a production app, I guess that something like that is what you are already doing, because everything that is fundamentally different than that is error-prone.
Every better solution has to provide the same semantics and it MUST not call any of the methods (getDestination, getDevice, getName) more than once.
That said, I don't think you can simplify the code much with Java 7.
What you can do of course, is shorten the call chains: E.g. you could create a method getDeviceName() on Destination, if you need this functionality often. If this makes the code more readable depends on the concrete situation.
Forcing you to code on this low level also has advantages: you can do common subexpression elimination, and you'll see the advantages of it, because it will make the code shorter. E.g. if you have:
String name1 = Model.getDevice().getConnection().getContext().getName();
String name2 = Model.getDevice().getConnection().getContext().getLabel();
you can simplify them to
Context ctx = Model.getDevice().getConnection().getContext();
String name1 = ctx.getName();
String name2 = ctx.getLabel();
The second snippet has 3 lines, while the first snippet has only two lines. But if you unroll the two snippets to include null-checks, you will see that the second version is in fact much shorter. (I'm not doing it now because I'm lazy.)
Therefore (regarding Optional-chaining), Java 7 will make the code of the performance-aware coder look better, while many more high-level languages create incentives to make slow code. (Of course you can also do common subexpression elimination in higher level languages (and you probably should), but in my experience most developers are more reluctant to do it in high level languages. Whereas in Assembler, everything is optimized, because better performance often means you have to write less code and the code that you write is easier to understand.)
In a perfect word, we would all use languages that have built-in optional chaining, and we would all use it responsibly, without creating performance problems and race conditions.

You can use try-catch. Because there is no processing required in your case, like
try{
if(IsValid(Model.getDestination(), Model.getDestination().getDevice(), Model.getDestination().getDevice().getName())){
}catch(Exception e){
//do nothing
}
Alternatively you can improve your isValid method by passing only Model object
boolean isValid(Model model){
return (model != null && model.getDestination() != null && model.getDestination().getDevice() != null && model.getDestination().getDevice().getName() != null)
}

Efficient way of mimicking hibernate criteria on cached map

I have just wrote a code to cach a table in the memory (simple java hashmap). Now one of the code that i am trying to replace is the find the objects based on criteria. it receives multiple field parameters and if those fields are not empty and not null, they were being added as part of hibernate query criteria.
To replace this, what i am thinking to do is
For each valid param (not null and no empty) I will create a HashSet which will satisfy this criteria.
Once i am done making hashsets for all valid criteria, I will call Set.retainAll(second_set) on all sets. So that at the end, I will have only that set which is intersection of all valid criteria.
Does it sound like the best approach or is there any better way to implement this ?
EDIT
Though, My original post is still valid and I am looking for that answer. I ended up implementing it in the following way. The reason is that it was kind a cumbersome with sets since after creating all sets, I had to first figure out which set is non empty so that the retainAll could be called. it was resulting in lots of if-else statements. My current implementation is like this
private List<MyObj> getCachedObjs(Long criteria1, String criteria2, String criteria3) {
List<MyObj> results = new ArrayList<>();
int totalActiveFilters = 0;
if (criteria1 != null){
totalActiveFilters++;
}
if (!StringUtil.isBlank(criteria2)){
totalActiveFilters++;
}
if (!StringUtil.isBlank(criteria3)){
totalActiveFilters++;
}
for (Map.Entry<Long, MyObj> objEntry : objCache.entrySet()){
MyObj obj = objEntry.getValue();
int matchedFilters = 0;
if (criteria1 != null) {
if (obj.getCriteria1().equals(criteria1)) {
matchedFilters++;
}
}
if (!StringUtil.isBlank(criteria2)){
if (obj.getCriteria2().equals(criteria2)){
matchedFilters++;
}
}
if (!StringUtil.isBlank(criteria3)){
if (game.getCriteria3().equals(criteria3)){
matchedFilters++;
}
}
if (matchedFilters == totalActiveFilters){
results.add(obj);
}
}
return results;
}

What is the best way to perform index ranged searches in OrientDB from Java?

We are using OrientDB in the embedded mode, and are hoping to access it directly with Java api calls (not using the SQL-ish language). We have an index, and need to perform a ranged search on it. Here is the only way I have found so far:
String startAt = createInternalOIndexSearchableKey(actualKey);
Index<Edge> index = graph.getIndex(indexName, Edge.class);
OrientIndex orientIndex = (OrientIndex) index;
OIndex oIndex = orientIndex.getUnderlying();
boolean INCLUSIVE = true;
boolean ASCENDING = true;
OIndexCursor cursor = oIndex.iterateEntriesMajor(startAt, INCLUSIVE, ASCENDING);
while(cursor.hasNext())
{
Entry<Object, OIdentifiable> entry = cursor.nextEntry();
...process the entry here
It feels uncomfortable to be deviating so far from the normal public API. Especially the implementation of createInternalOIndexSearchableKey:
private String createInternalOIndexSearchableKey(String actualKey)
{
// NOTE: Keys passed to OIndex.iterateEntriesMajor must
// be in the (undocumented) format: EdgeLabel!=!ActualKey
return KEY_CAN_DOWNLOAD_PUBCODETIMESTAMP + "!=!" + actualKey;
}
Is there a better way to do this?

OIndex and OIndexCursor is a public api of Document database, so no worry, you can use it.
However the main aim of API is to provide flexibility to SQL engine and other internal components, so it is not very convenient.
I would recommend you to use sql queries, they provide the same level of flexibility and more compact, that make their use more convenient.

Java Server Client, shared variable between threads

I am working on a project to create a simple auction server that multiple clients connect to. The server class implements Runnable and so creates a new thread for each client that connects.
I am trying to have the current highest bid stored in a variable that can be seen by each client. I found answers saying to use AtomicInteger, but when I used it with methods such as atomicVariable.intValue() I got null pointer exception errors.
What ways can I manipulate the AtomicInteger without getting this error or is there an other way to have a shared variable that is relatively simple?
Any help would be appreciated, thanks.
Update
I have the AtomicInteger working. The problem is now that only the most recent client to connect to the server seems to be able to interact with it. The other client just sort of freeze.
Would I be correct in saying this is a problem with locking?

Well, most likely you forgot to initialize it:
private final AtomicInteger highestBid = new AtomicInteger();
However working with highestBid requires a great deal of knowledge to get it right without any locking. For example if you want to update it with new highest bid:
public boolean saveIfHighest(int bid) {
int currentBid = highestBid.get();
while (currentBid < bid) {
if (highestBid.compareAndSet(currentBid, bid)) {
return true;
}
currentBid = highestBid.get();
}
return false;
}
or in a more compact way:
for(int currentBid = highestBid.get(); currentBid < bid; currentBid = highestBid.get()) {
if (highestBid.compareAndSet(currentBid, bid)) {
return true;
}
}
return false;
You might wonder, why is it so hard? Image two threads (requests) biding at the same time. Current highest bid is 10. One is biding 11, another 12. Both threads compare current highestBid and realize they are bigger. Now the second thread happens to be first and update it to 12. Unfortunately the first request now steps in and revert it to 11 (because it already checked the condition).
This is a typical race condition that you can avoid either by explicit synchronization or by using atomic variables with implicit compare-and-set low-level support.
Seeing the complexity introduced by much more performant lock-free atomic integer you might want to restore to classic synchronization:
public synchronized boolean saveIfHighest(int bid) {
if (highestBid < bid) {
highestBid = bid;
return true;
} else {
return false;
}
}

I wouldn't look at the problem like that. I would simply store all the bids in a ConcurrentSkipListSet, which is a thread-safe SortedSet. With the correct implementation of compareTo(), which determines the ordering, the first element of the Set will automatically be the highest bid.
Here's some sample code:
public class Bid implements Comparable<Bid> {
String user;
int amountInCents;
Date created;
#Override
public int compareTo(Bid o) {
if (amountInCents == o.amountInCents) {
return created.compareTo(created); // earlier bids sort first
}
return o.amountInCents - amountInCents; // larger bids sort first
}
}
public class Auction {
private SortedSet<Bid> bids = new ConcurrentSkipListSet<Bid>();
public Bid getHighestBid() {
return bids.isEmpty() ? null : bids.first();
}
public void addBid(Bid bid) {
bids.add(bid);
}
}
Doing this has the following advantages:
Automatically provides a bidding history
Allows a simple way to save any other bid info you need
You could also consider this method:
/**
* #param bid
* #return true if the bid was successful
*/
public boolean makeBid(Bid bid) {
if (bids.isEmpty()) {
bids.add(bid);
return true;
}
if (bid.compareTo(bids.first()) <= 0) {
return false;
}
bids.add(bid);
return true;
}

Using an AtomicInteger is fine, provided you initialise it as Tomasz has suggested.
What you might like to think about, however, is whether all you will literally ever need to store is just the highest bid as an integer. Will you never need to store associated information, such as the bidding time, user ID of the bidder etc? Because if at a later stage you do, you'll have to start undoing your AtomicInteger code and replacing it.
I would be tempted from the outset to set things up to store arbitrary information associated with the bid. For example, you can define a "Bid" class with the relevant field(s). Then on each bid, use an AtomicReference to store an instance of "Bid" with the relevant information. To be thread-safe, make all the fields on your Bid class final.
You could also consider using an explicit Lock (e.g. see the ReentrantLock class) to control access to the highest bid. As Tomasz mentions, even with an AtomicInteger (or AtomicReference: the logic is essentially the same) you need to be a little careful about how you access it. The atomic classes are really designed for cases where they are very frequently accessed (as in thousands of times per second, not every few minutes as on a typical auction site). They won't really give you any performance benefit here, and an explicit Lock object might be more intuitive to program with.

for each inside a for each - Java

for (Tweet tweet : tweets) {
for(long forId : idFromArray){
long tweetId = tweet.getId();
if(forId != tweetId){
String twitterString = tweet.getText();
db.insertTwitter(twitterString, tweetId);
}
}
}
My code won't run pass the first for{} loop, that's why idFromArray is empty since I don't add anything there until a tweet is has been added to the database.
And even if there is something in the array it loops the whole thing twice (DUH! Since I have two loops) which makes the database very bloated with the same tweets.
It is not a simple compare of the two tweets id and simply ignore the ones with the same id.
I'm pretty certain there is a really simple solution to this problem, but I still can't wrap my head around it. Anybody?
UPDATE:
What I want is the code to ignore the the tweetId that already is in the database.
And just insert the tweets that is not in the database.
I don't think I should have two for-loops, I think the second loop should be replaced with something? (or maybe I'm wrong?)

If I understand correctly, what you want to do, in pseudo-code is the following:
for (Tweet tweet : tweets) {
if (!db.containsTweet(tweet.getId())) {
db.insertTweet(tweet.getText(), tweet.getId());
}
}
I assume your db class actually uses an sqlite database as a backend? What you could do is implement containsTweet directly and just query the database each time, but that seems less than perfect. The easiest solution if we go by your base code is to just keep a Set around that indexes the tweets. Since I can't be sure what the equals() method of Tweet looks like, I'll just store the identifiers in there. Then you get:
Set<Integer> tweetIds = new HashSet<Integer>(); // or long, whatever
for (Tweet tweet : tweets) {
if (!tweetIds.contains(tweet.getId())) {
db.insertTweet(tweet.getText(), tweet.getId());
tweetIds.add(tweet.getId());
}
}
It would probably be better to save a tiny bit of this work, by sorting the list of tweets to begin with and then just filtering out duplicate tweets. You could use:
// if tweets is a List
Collections.sort(tweets, new Comparator() {
public int compare (Object t1, Object t2) {
// might be the wrong way around
return ((Tweet)t1).getId() - ((Tweet)t2).getId();
}
}
Then process it
Integer oldId;
for (Tweet tweet : tweets) {
if (oldId == null || oldId != tweet.getId()) {
db.insertTweet(tweet.getText(), tweet.getId());
}
oldId = tweet.getId();
}
Yes, you could do this using a second for-loop, but you'll run into performance problems much more quickly than with this approach (although what we're doing here is trading time for memory performance, of course).

Your syntax is not correct. It should be like that:
for (Tweet tweet : tweets) {
for(long forId : idFromArray){
long tweetId = tweet.getId();
if(forId != tweetId){
String twitterString = tweet.getText();
db.insertTwitter(twitterString);
}
}
}
EDIT
This answer no longer really answers the question since it was updated ;)

most simple solution would be to set a boolean var. if to true where you do the insert statement and then in the outter loop check this and insert the tweet there if the boolean is true...

for (Tweet : tweets){ ...
should really be
for(Tweet tweet: tweets){...

So you really want:
for each tweet
unless tweet is in db
insert tweet
If so, just write it down in your programming language.
Hint: The loop over the array is to be done before the insert, which is done depending on the outcome.
What you want to test is that all array elements are not equal to the current one. But your for loop does not do that.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Performance - Iterating in Java vs Querying the database - java

Related

Better method for avoiding null in nested data with Java 7

Efficient way of mimicking hibernate criteria on cached map

What is the best way to perform index ranged searches in OrientDB from Java?

Java Server Client, shared variable between threads

for each inside a for each - Java

Categories

Resources