I just came across a question when using a List and its stream() method. While I know how to use them, I'm not quite sure about when to use them.
For example, I have a list, containing various paths to different locations. Now, I'd like to check whether a single, given path contains any of the paths specified in the list. I'd like to return a boolean based on whether or not the condition was met.
This of course, is not a hard task per se. But I wonder whether I should use streams, or a for(-each) loop.
The List
private static final List<String> EXCLUDE_PATHS = Arrays.asList(
"my/path/one",
"my/path/two"
);
Example using Stream:
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream()
.map(String::toLowerCase)
.filter(path::contains)
.collect(Collectors.toList())
.size() > 0;
}
Example using for-each loop:
private boolean isExcluded(String path){
for (String excludePath : EXCLUDE_PATHS) {
if (path.contains(excludePath.toLowerCase())) {
return true;
}
}
return false;
}
Note that the path parameter is always lowercase.
My first guess is that the for-each approach is faster, because the loop would return immediately, if the condition is met. Whereas the stream would still loop over all list entries in order to complete filtering.
Is my assumption correct? If so, why (or rather when) would I use stream() then?
Your assumption is correct. Your stream implementation is slower than the for-loop.
This stream usage should be as fast as the for-loop though:
EXCLUDE_PATHS.stream()
.map(String::toLowerCase)
.anyMatch(path::contains);
This iterates through the items, applying String::toLowerCase and the filter to the items one-by-one and terminating at the first item that matches.
Both collect() & anyMatch() are terminal operations. anyMatch() exits at the first found item, though, while collect() requires all items to be processed.
The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.
With your .filter(path::contains).collect(Collectors.toList()).size() > 0 approach, you are processing all elements and collecting them into a temporary List, before comparing the size, still, this hardly ever matters for a Stream consisting of two elements.
Using .map(String::toLowerCase).anyMatch(path::contains) can save CPU cycles and memory, if you have a substantially larger number of elements. Still, this converts each String to its lowercase representation, until a match is found. Obviously, there is a point in using
private static final List<String> EXCLUDE_PATHS =
Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
.collect(Collectors.toList());
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream().anyMatch(path::contains);
}
instead. So you don’t have to repeat the conversion to lowcase in every invocation of isExcluded. If the number of elements in EXCLUDE_PATHS or the lengths of the strings becomes really large, you may consider using
private static final List<Predicate<String>> EXCLUDE_PATHS =
Stream.of("my/path/one", "my/path/two").map(String::toLowerCase)
.map(s -> Pattern.compile(s, Pattern.LITERAL).asPredicate())
.collect(Collectors.toList());
private boolean isExcluded(String path){
return EXCLUDE_PATHS.stream().anyMatch(p -> p.test(path));
}
Compiling a string as regex pattern with the LITERAL flag, makes it behave just like ordinary string operations, but allows the engine to spent some time in preparation, e.g. using the Boyer Moore algorithm, to be more efficient when it comes to the actual comparison.
Of course, this only pays off if there are enough subsequent tests to compensate the time spent in preparation. Determining whether this will be the case, is one of the actual performance considerations, besides the first question whether this operation will ever be performance critical at all. Not the question whether to use Streams or for loops.
By the way, the code examples above keep the logic of your original code, which looks questionable to me. Your isExcluded method returns true, if the specified path contains any of the elements in list, so it returns true for /some/prefix/to/my/path/one, as well as my/path/one/and/some/suffix or even /some/prefix/to/my/path/one/and/some/suffix.
Even dummy/path/onerous is considered fulfilling the criteria as it contains the string my/path/one…
Yeah. You are right. Your stream approach will have some overhead. But you may use such a construction:
private boolean isExcluded(String path) {
return EXCLUDE_PATHS.stream().map(String::toLowerCase).anyMatch(path::contains);
}
The main reason to use streams is that they make your code simpler and easy to read.
The goal of streams in Java is to simplify the complexity of writing parallel code. It's inspired by functional programming. The serial stream is just to make the code cleaner.
If we want performance we should use parallelStream, which was designed to. The serial one, in general, is slower.
There is a good article to read about ForLoop, Stream and ParallelStream Performance.
In your code we can use termination methods to stop the search on the first match. (anyMatch...)
Radical answer:
Never. Ever. Ever.
I almost never iterated a list for anything, especially to find something, yet stream users and systems seem filled with that way of coding.
I find it difficult to refactor and organize such code and I see redundancy and over iteration everywhere in stream heavy systems. In the same method you might see it 5 times. Same list, finding different things.
It is also not really shorter either. Rarely is. Definitely not more readable but that is a subjective opinion. Some people will say it is. I don't. People might like it due to autocompletion but in my editor Intellij, I can just iter or itar and have the for loop auto created for me with types and everything.
Often misused and overused, and I think it is better to avoid it completely. Java is not a true functional language and Java generics suck and are not expressive enough, and certainly more difficult to read, parse and refactor. Just try to visit any of the native Java stream libraries. Do you find that easy to parse?
Also, stream code is not easily extractable or refactorable unless you want to start adding weird methods that return Optionals, Predicates, Consumers and what not and you end up having methods returning and taking all kinds of weird generic constraints with orders and meanings only God knows what.
Too much is inferred where you need to visit methods to figure out the types of various things.
Trying to make Java behave like a functional language like Haskell or Lisp is a fools errand. A heavy streams based Java system is always going to be more complex than a none one and way less performant and more complex to refactor and maintain.
Thus also more buggy and filled with patch work. Glue work everywhere due to the redundancy often filled in such systems. Some people just don't have an issue with redundancy. I am not one of them. Nor should you be either.
When OpenJDK got involved they started adding things to the language without really thinking it thoroughly enough. It is now not just Java Streams which is an issue. Now systems are inherently more complex because they require more base knowledge of these API's. You might have it, but your colleagues don't. They sure as hell know what a for loop is and what an if block is.
Furthermore, since you also can not assign anything to a non final variable you can rarely do two things at the same while looping, so you end up iterating twice, or thrice.
Most that like and prefer the stream approach over a for loop are most likely people that started learning Java post Java 8. Those before hate it. The thing is that it is far more complex to use, refactor and more difficult to use the right way. It requires skills to not fuck up, and then even more skills and energy to repair fuck ups.
And when I say it performs worse, it is not in comparison to a for loop, which is also a very real thing but more due to the tendency such code have to over iterate a wide range of things. It is deemed so easy to iterate a list to find an item that it tends being done over and over again.
I've not seen a single system that has benefitted from it. All of the systems I have seen are horribly implemented, mostly because of it, and I've worked in some of the biggest companies in the world.
Code is definitely not more readable than a for loop and a for loop is definitely more flexible and refactorable. The reason we see so many complex shitty systems and bugs everywhere today is, I promise you due to the heavy reliance on streams to filter, not to mention the accompanied overuse of Lombok and Jackson. Those three are the hallmark of a badly implemented system. Keyword overuse. A patch work approach.
Again, I consider it really bad to iterate a list to find anything. Yet with Stream based systems, this is what people do all the time. It is also not rare and difficult to parse and detect that an iteration might be O(N2) while with a for loop you would immediately see it.
What is often customary to ask the database to filter things for you it is now not rare that instead a base query instead return a big list of things with all kind of iterative logic and methods to filter out the undesirables and of course they use streams to do this. All kinds of methods arises around that big list with various things to filter out things.
Often redundant filtering and thus logic too. Over and over again.
Of course, I do not mean you. But your colleagues. Right?
Personally, I rarely ever iterate anything. I use the right datasets and rely on the database to filter it for me. Once. However in a streams heavy system you will see iteration everywhere.
In the deepest method, in the caller, caller of caller, caller of the caller of the caller. Streams everywhere. It is ugly. And good luck refactoring that code that lives in tiny lambdas. And good luck reusing them. Nobody will look to reuse your nice Predicates.
And if they want to use them, guess what? They need to use more Streams. You just got yourself addicted and cornered yourself further. Now, are you proposing I start splitting all of my code in tiny Predicates, Consumers, Function and BiFcuntions? Just so I can reuse that logic for Streams?
Of course I hate it just as much in Javascript as well where over iteration is everywhere by noob frontend developers.
You might say the cost is nothing to iterate a list but the system complexity grows, redundancy increases and therefore maintenance costs and number of bugs increases. It becomes a patch and glue based approach to various things. Just add another filter and remove this, rather than code things the right way.
Furthermore, where you need three servers to host all of your users, I can manage with just one. So required scalability of such a system is going to be required way earlier than a non streams heavy system. For small projects that is a very important metric. Where you can have say 5000 concurrent users, my system can handle twice or thrice that.
I have no need for it in my code, and when I am in charge of new projects, the first rule is that streams are totally forbidden to use.
That is not to say there are not use cases for it or that it might be useful at times but the risks associated with allowing it far outweighs the benefits.
When you start using Streams you are essentially adopting a whole new programming paradigm. The entire programming style of the system will change and that is what I am concerned about.
You do not want that style. It is not superior to the old style. Especially on Java.
Take the Futures API as an example.
Sure, you could start coding everything to return a Promise or a Future, but do you really want to? Is that going to resolve anything? Can your entire system really follow up on being that, everywhere?
Will it be better for you, or are you just experimenting and hoping you will benefit at some point?
There are people that overdo JavaRx and overdo promises in JavaScript as well. There are really really few cases for when you really want to have things futures based and very many many corner cases will be felt where you will find that those APIs have certain limitations and you just got made.
You can build really really complex and far far more maintainable systems without all that crap.
This is what it is about. It is not about your hobby project expanding and becoming a horrible code base.
It is about what is best approach to build large and complex enterprise systems and ensure they remain coherent, consistent refactorable, and easily maintainable.
Furthermore, rarely are you ever working on such systems on your own.
You are very likely working with a minimum of > 10 people all experimenting and overdoing Streams.
So while you might know how to use them properly you can rest assure the other 9 really don't. They just love experimenting and learning by doing.
I will leave you with these wonderful examples of real code, with thousands of more similar to them:
Or this:
Or this:
Or this:
Try refactoring any of the above. I challenge you. Give it a try. Everything is a Stream, everywhere. This is what Stream developers do, they overdo it, and there is no easy way to grasp what the code is actually doing. What is this method returning, what is this transformation doing, what do I end up with. Everything is inferred. Much more difficult to read for sure.
If you understand this, then you must be the einstein, but you should know not everyone is like you, and this could be your system in a very near future.
Do note, this is not isolated to this one project but I've seen many of them very similar to these structures.
One thing is for sure, horrible coders love streams.
As others have mentioned many good points, but I just want to mention lazy evaluation in stream evaluation. When we do map() to create a stream of lower case paths, we are not creating the whole stream immediately, instead the stream is lazily constructed, which is why the performance should be equivalent to the traditional for loop. It is not doing a full scanning, map() and anyMatch() are executed at the same time. Once anyMatch() returns true, it will be short-circuited.
I recently saw this question and thought that the person who asked question is correct to some degree. The answer informed that we should not use assertions to perform any tasks in our program.
But assertions can act as easy one liners for maintaining loop invariants and program invariants , so that we can check program correctness to a degree.
And why are the assertions necessary even if we have if else?? It just tests a Boolean expression similar things could be done from if -else ladders or so then why bother creating a new keyword Assertion??
A 'task in your program', in context, means something that should be done, and ideally tested for.
Not only is:
assert p != null
shorter and simpler than:
if (p == null) throw new IllegalArgumentException("p is null");
making it an assert clearly documents the fact that it is an internal constraint, not a specified behavior. So you don't need another 4 lines testing it.
Of course, sometimes explicitly specified behavior is what you want, e.g public long-lived APIs.
In other words, while they are similar, it's slightly wrong to use an assert where if/throw is correct, and vice versa.
Nevertheless, a lot of Java code doesn't bother with assert, as that leaves one less decision to make. I'm not sure it would be added to the language if it didn't exist...
Is it generally considered bad practice to structure code with embedded expressions in method parameters? Should variables be declared instead?
(Android code snippet for an example)
((EditText)view.findViewById(R.id.fooEditText))
.setText(
someExpression
? getResources().getString(R.string.true_expression_text)
: getResources().getString(R.string.false_expression_text)
);
Personally I think it looks fine, but am just wondering if this is considered repulsing :)
I would almost certainly simplify that, in a number of ways:
EditText editText = (EditText) view.findViewById(R.id.fooEditText);
String resourceName = someExpression ? R.string.true_expression_text
: R.string.false_expression_text;
editText.setText(getResources().getString(resourceName));
Doing it all in one statement makes it harder to read and harder to debug, IMO. Note that I've also removed duplication here, but using the fact that you were calling getResources().getString(...) in both operands of the conditional operator, just with different resource names.
My main beef with the original code is calling a method on the result of a cast - aside from anything else, it introduces more brackets than you need, which is generally confusing.
I'd say this depends on the situation, for instance.
player.setName(User.getName());
Would be fine, however, train wrecking such as below...
player.setName(getGroup().getUsers().get(0).getName());
I'd say is bad practice and is mentioned in Clean Code by Bob Martin regarding the dangers of train wrecks. Also duplicate calls as mentioned by #Jon Skeet is another reason to use a variable rather than a method call.
The word "repulsing" was yours, but it certainly describes my reaction. I can't focus on what this statement is doing because it has an if statement, a search, and at least 5 dereferences happening before it gets started.
I find the trinary operator particularly pernicious, since I have to hold two disjoint sets of state in my mind while I parse everything else. Some folks prefer brevity to local variables (I'm not one of them) but trinary operators (or any other branch) embedded in other statements are especially unloveable. If you ignore the rest of Clean Code or similar works because you enjoy complex statements, at least separate the conditionals out.
A little background, I'm hitting a situation where I haven't been able to enable assertions (asked here) and using a great solution like forceassertions is not possible for me because of this.
Assertions have always been a formidable weapon for us during the development and testing phase and we're not prepared to let it go.
That being the case, 2 options came into my mind.
The first, much like JUnit's Assert class:
Assert.assertTrue(result.financialInfoDTO.getPeriods().size() <= FinancialInfoConstants.NUMBER_OF_VISIBLE_PERIOD);
The second, trying to mimic Java's native assert keyword behaviour where we can enable or disable it:
Assert.assert(new Assertion() {
public boolean doAssert() { return result.financialInfoDTO.getPeriods().size() <= FinancialInfoConstants.NUMBER_OF_VISIBLE_PERIOD; }
});
I would like to have the luxury to enable and disable the assertion feature, which only solution I can think of is something like the later. What I am asking is, given most assertion would be comparing the size of collections and comparing values of some sort, would we better off using the first option or the later?
To put it in a more technical context, which is more efficient? evaluating simple expression or creating new objects in the heap all the time?
Object instantiation is usually more expensive. You can benchmark it yourself.
I will choose option 1.
It prevents from the creation of a new anonymous class and the
instanciation of this object each time you pass through your code.
With the second option you are accessing an object
(result) defined outside of your assertion class.
That's the first reason, don't you think that first one is more readable?
I have always written my boolean expressions like this:
if (!isValid) {
// code
}
But my new employer insists on the following style:
if (false == isValid) {
// code
}
Is one style preferred, or standard?
I prefer the first style because it is more natural for me to read. It's very unusual to see the second style.
One reason why some people might prefer the second over another alternative:
if (isValid == false) { ... }
is that with the latter you accidentally write a single = instead of == then you are assigning to isValid instead of testing it but with the constant first you will get a compile error.
But with your first suggestion this issue isn't even a problem, so this is another reason to prefer the first.
Absolutely the first. The second betrays a lack of understanding of the nature of expressions and values, and as part of the coding standard, it implies that the employer expects to hire very incompetent programmers - not a good omen.
Everybody recognizes this snippet:
if (isValid.toString().lenght() > 4) {
//code
}
I think your second example looks at the same direction.
It was discussed for C# several hours ago.
The false == isValid construct is a leftover from C-world, where compiler would allow you to do assignments in if statement. I believe Java compilers will warn you in such case.
Overall, second option is too verbose.
IMO the first one is much more readable while the second one more verbose.
I would surely go for the 1st one
You are evaluating the variable, not false so the latter is not correct from a readability perspective. So I would personally stick with the first option.
I'm going to attempt a comprehensive answer here that incorporates all the above answers.
The first style is definitely to be preferred for the following reasons:
it's shorter
it is more readable, and hence easier to understand
it is more widely used, which means that readers will recognize the pattern more quickly
"false==..." rather than "...==false" is yet another violation of natural order,which makes the reader think "is there something strange going on that I need to pay attention to", when there isn't.
The only exception to this is when the variable is a Boolean rather than a boolean. In that case the second is a different expression from the first, evaluating to false when isValid is null as well as when it is Boolean.FALSE. If this is the case there are good arguments for using the second.
The second style doesn't require you to negate the expression yourself (which might be far more complicated than just "isValid"). But writing "isValid == false" may lead to an unintended assignment if you forget to type two ='s, hence the idiom is to put on the right side what can't be an rvalue.
The first style seems to be preferred among people who know what they're doing.
I just want to say I learned C twenty years ago in school and have moving onto Perl and Java and now C# which all have the same syntax and...
I think (!myvar) is the most popular
I think (myvar==false) is just fine too
in 20 years i have NEVER EVEN SEEN
(false==myvar)
I think your boss is smoking something-- I'm sorry but I'd take this as a sign your boss is some kind of control freak or numbskull.