It is known that mapX has a single entry in it. A code metric tool complains about using 'break' inside for loops, but normally, in such a small loop, I'd let it pass. In this case though, with it known that there's only a single entry, I'm not sure which is more efficient.
So, which of the following is more efficient when you know that the map will only ever contain one element?
Possible replacement:
Map.Entry<String,String> e = mapX.entrySet().iterator().next();
Y.setMsg(e.getValue());
Y.setMsgKey(e.getKey());
Original code:
for (String key : mapX.keySet()){
Y.setMsg(mapX.get(key));
Y.setMsgKey(key);
break;
}
The first version is faster than the second one (with the "for" loop / "break"):
The loop version has to call hasNext() before the call to next().
The loop version is iterating the keySet rather than the entryset, and therefore has to do an extra map lookup to get the corresponding value.
The loop version possibly has an extra branch instruction at the break ... though this is minor and can possibly be optimized away by the JIT compiler.
But the best reason for using the first version (IMO) is that the code is easier to understand / less complex. (The code metric tool is helpful in this case ...)
Of course, the flip-side is that if the map is empty, the non-looping version of the code is going to throw an exception. If you need to deal with the "empty map" case, you should write it like this:
if (!mapX.isEmpty()) {
Map.Entry<String,String> e = mapX.entrySet().iterator().next();
y.setMsg(e.getValue());
y.setMsgKey(e.getKey());
}
Personally I find first variant more concise. If you're using guava, you could also write it like this:
Map.Entry<String,String> e = Iterables.getOnlyElement(mapX.entrySet());
This
Map.Entry<String,String> e = mapX.entrySet().iterator().next();
Y.setMsg(e.getValue());
Y.setMsgKey(e.getKey());
is the same as
for (String key : mapX.keySet()){
Y.setMsg(mapX.get(key));
Y.setMsgKey(key);
break;
}
with one exception: The loop first calls hasNext() on the iterator.
The former implies more strongly that there's only one element (at least, only one that you're interested in). The latter says "I'm going to loop through all elements...except I'm going to stop after the first one." Why use a loop, if you're not going to loop?
Seems like the non-looping version is the one to go with.
Related
I'm fairly inexperienced with using objects so I would really like some input.
I'm trying to remove comments from a list that have certain "unwanted words" in them, both the comments and the list of "unwanted words" are in ArrayList objects.
This is inside of a class called FormHelper, which contains the private member comments as an ArrayList, the auditList ArrayList is created locally in a member function called populateComments(), which then calls this function (below). PopulateComments() is called by the constructor, and so this function only gets called once, when an instance of FormHelper is created.
private void filterComments(ArrayList <String> auditList) {
for(String badWord : auditList) {
for (String thisComment : this.comments) {
if(thisComment.contains(badWord)) {
int index = this.comments.indexOf(thisComment);
this.comments.remove(index);
}
}
}
}
something about the way I implemented this doesn't feel right, I'm also concerned that I'm using ArrayList functions inefficiently. Is my suspicion correct?
It is not particularly efficient. However, finding a more efficient solution is not straightforward.
Lets step back to a simpler problem.
private void findBadWords(List <String> wordList, List <String> auditList) {
for(String badWord : auditList) {
for (String word : wordList) {
if (word.equals(badWord)) {
System.err.println("Found a bad word");
}
}
}
}
Suppose that wordList contains N words and auditList contains M words. Some simple analysis will show that the inner loop is executed N x M times. The N factor is unavoidable, but the M factor is disturbing. It means that the more "bad" words you have to check for the longer it takes to check.
There is a better way to do this:
private void findBadWords(List <String> wordList, HashSet<String> auditWords) {
for (String word : wordList) {
if (auditWords.contains(word))) {
System.err.println("Found a bad word");
}
}
}
Why is that better? It is better (faster) because HashSet::contains doesn't need to check all of the audit words one at a time. In fact, in the optimal case it will check none of them (!) and the average case just one or two of them. (I won't go into why, but if you want to understand read the Wikipedia page on hash tables.)
But your problem is more complicated. You are using String::contains to test if each comment contains each bad word. That is not a simple string equality test (as per my simplified version).
What to do?
Well one potential solution is to split the the comments into an array of words (e.g. using String::split and then user the HashSet lookup approach. However:
That changes the behavior of your code. (In a good way actually: read up on the Scunthorpe problem!) You will now only match the audit words is they are actual words in the comment text.
Splitting a string into words is not cheap. If you use String::split it entails creating and using a Pattern object to find the word boundaries, creating substrings for each word and putting them into an array. You can probably do better, but it is always going to be a non-trivial calculation.
So the real question will be whether the optimization is going to pay off. That is ultimately going to depend on the value of M; i.e. the number of bad words you are looking for. The larger M is, the more likely it will be to split the comments into words and use a HashSet to test the words.
Another possible solution doesn't involve splitting the comments. You could take the list of audit words and assemble them into a single regex like this: \b(word-1|word-2|...|word-n)\b. Then use this regex with Matcher::find to search each comment string for bad words. The performance will depend on the optimizing capability of the regex engine in your Java platform. It has the potential to be faster than splitting.
My advice would be to benchmark and profile your entire application before you start. Only optimize:
when the benchmarking says that the overall performance of the requests where this comment checking occurs is concerning. (If it is OK, don't waste your time optimizing.)
when the profiling says that this method is a performance hotspot. (There is a good chance that the real hotspots are somewhere else. If so, you should optimize them rather than this method.)
Note there is an assumption that you have (sufficiently) completed your application and created a realistic benchmark for it before you think about optimizing. (Premature optimization is a bad idea ... unless you really know what you are doing.)
As a general approach, removing individual elements from an ArrayList in a loop is inefficient, because it requires shifting all of the "following" elements along one position in the array.
A B C D E
^ if you remove this
^---^ you have to shift these 3 along by one
/ / /
A C D E
If you remove lots of elements, this will have a substantial impact on the time complexity. It's better to identify the elements to remove, and then remove them all at once.
I suggest that a neater way to do this would be using removeIf, which (at least for collection implementations such as ArrayList) does this "all at once" removal:
this.comments.removeIf(
c -> auditList.stream().anyMatch(c::contains));
This is concise, but probably quite slow because it has to keep checking the entire comment string to see if it contains each bad word.
A probably faster way would be to use regex:
Pattern p = Pattern.compile(
auditList.stream()
.map(Pattern::quote)
.collect(joining("|")));
this.comments.removeIf(
c -> p.matcher(c).find());
This would be better because the compiled regex would search for all of the bad words in a single pass over each comment.
The other advantage of a regex-based approach is that you can check case insensitively, by supplying the appropriate flag when compiling the regex.
I would like to get an answer pointing out the reasons why the following idea described below on a very simple example is commonly considered bad and know its weaknesses.
I have a sentence of words and my goal is to make every second one to uppercase. My starting point for both of the cases is exactly the same:
String sentence = "Hi, this is just a simple short sentence";
String[] split = sentence.split(" ");
The traditional and procedural approach is:
StringBuilder stringBuilder = new StringBuilder();
for (int i=0; i<split.length; i++) {
if (i%2==0) {
stringBuilder.append(split[i]);
} else {
stringBuilder.append(split[i].toUpperCase());
}
if (i<split.length-1) { stringBuilder.append(" "); }
}
When want to use java-stream the use is limited due the effectively-final or final variable constraint used in the lambda expression. I have to use the workaround using the array and its first and only index, which was suggested in the first comment of my question How to increment a value in Java Stream. Here is the example:
int index[] = {0};
String result = Arrays.stream(split)
.map(i -> index[0]++%2==0 ? i : i.toUpperCase())
.collect(Collectors.joining(" "));
Yeah, it's a bad solution and I have heard few good reasons somewhere hidden in comments of a question I am unable to find (if you remind me some of them, I'd upvote twice if possible). But what if I use AtomicInteger - does it make any difference and is it a good and safe way with no side effects compared to the previous one?
AtomicInteger atom = new AtomicInteger(0);
String result = Arrays.stream(split)
.map(i -> atom.getAndIncrement()%2==0 ? i : i.toUpperCase())
.collect(Collectors.joining(" "));
Regardless of how ugly it might look for anyone, I ask for the description of possible weaknesses and their reasons. I don't care the performance but the design and possible weaknesses of the 2nd solution.
Please, don't match AtomicInteger with multi-threading issue. I used this class since it receives, increments and stores the value in the way I need for this example.
As I often say in my answers that "Java Stream-API" is not the bullet for everything. My goal is to explore and find the edge where is this sentence applicable since I find the last snippet quite clear, readable and brief compared to StringBuilder's snippet.
Edit: Does exist any alternative way applicable for the snippets above and all the issues when it’s needed to work with both item and index while iteration using Stream-API?
The documentation of the java.util.stream package states that:
Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.
[...]
The ordering of side-effects may be surprising. Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.
This means that the elements may be processed out of order, and thus the Stream-solutions may produce wrong results.
This is (at least for me) a killer argument against the two Stream-solutions.
By the process of elimination, we only have the "traditional solution" left. And honestly, I do not see anything wrong with this solution. If we wanted to get rid of the for-loop, we could re-write this code using a foreach-loop:
boolean toUpper = false; // 1st String is not capitalized
for (String word : splits) {
stringBuilder.append(toUpper ? word.toUpperCase() : word);
toUpper = !toUpper;
}
For a streamified and (as far as I know) correct solution, take a look at Octavian R.'s answer.
Your question wrt. the "limits of streams" is opinion-based.
The answer to the question (s) ends here. The rest is my opinion and should be regarded as such.
In Octavian R.'s solution, an artificial index-set is created through a IntStream, which is then used to access the String[]. For me, this has a higher cognitive complexity than a simple for- or foreach-loop and I do not see any benefit in using streams instead of loops in this situation.
In Java, comparing with Scala, you must be inventive. One solution without mutation is this one:
String sentence = "Hi, this is just a simple short sentence";
String[] split = sentence.split(" ");
String result = IntStream.range(0, split.length)
.mapToObj(i -> i%2==0 ? split[i].toUpperCase():split[i])
.collect(Collectors.joining(" "));
System.out.println(result);
In Java streams you should avoid the mutation. Your solution with AtomicInteger it's ugly and it's a bad practice.
Kind regards!
As explained in Turing85’s answer, your stream solutions are not correct, as they rely on the processing order, which is not guaranteed. This can lead to incorrect results with parallel execution today, but even if it happens to produce the desired result with a sequential stream, that’s only an implementation detail. It’s not guaranteed to work.
Besides that, there is no advantage in rewriting code to use the Stream API with a logic that basically still is a loop, but obfuscated with a different API. The best way to describe the idea of the new APIs, is to say that you should express what to do but not how.
Starting with Java 9, you could implement the same thing as
String result = Pattern.compile("( ?+[^ ]* )([^ ]*)").matcher(sentence)
.replaceAll(m -> m.group(1)+m.group(2).toUpperCase());
which expresses the wish to replace every second word with its upper case form, but doesn’t express how to do it. That’s up to the library, which likely uses a single StringBuilder instead of splitting into an array of strings, but that’s irrelevant to the application logic.
As long as you’re using Java 8, I’d stay with the loop and even when switching to a newer Java version, I would consider replacing the loop as not being an urgent change.
The pattern in the above example has been written in a way to do exactly the same as your original code splitting at single space characters. Usually, I’d encode “replace every second word” more like
String result = Pattern.compile("(\\w+\\W+)(\\w+)").matcher(sentence)
.replaceAll(m -> m.group(1)+m.group(2).toUpperCase());
which would behave differently when encountering multiple spaces or other separators, but usually is closer to the actual intention.
i'm writing some code, and I want my code to be well documented.
there is a part in the code where I'm checking that when there is a try to insert new element E to a list L, the element E will be unique (so there is no other elements in L that equals to him).
I'm having difficult to write a user-friendly mathematics comment, something that will look like the example bellow
the function will change all elements (that in list L) fields E.color to "Black" only if color to black element.size > 10.
so in that case I will write the comment -
[ X.color="Black" | X in L, X.size > 10]
but for the scenario above I couldnt find any satisfied mathmatics comment.
A mathematical set by definition has no duplicates inside it, so perhaps using the a set rather than a list would solve your problem.
However if that's too hard to change now then you could write something like:
[ L.insert(E) | E not in L ]
where E is the element and L is the list.
an exhaustive answer to your question requires two observations:
Best coding practices require you to know collections very well and when to use them. So you want the right collection for the right Job. In this case as advised in other comments, you need to use a Set instead of a list. A Set uses a Map under the hood, having your elements as keys and values as DEFAULT. Every time that you add an element to your Set, the Hash value for it is calculated and compared using equals to the existing elements. So no dups are allowed.
I really appreciate the fact that you want to write good comments, however you don't need to for the following reasons:
List and Sets behaviour is largely documented already, so nobody expects you to comment them;
Books like Refactoring and Clean code, teach us that good code should never be commented as it should be self explaining. That means that your method/class/variable name should tell me what the method is doing.
Is the following version of for loop possible (or a variation thereof fulfilling the purpose of shortening code with one line)?
for(String string: stringArray; string.toLowerCase()){
//stuff
}
Instead of
for(String string: stringArray){
string = string.toLowerCase();
//stuff
}
May seem like a stupid question but that one line is tiresome to write all the time when it applies to every element of the loop.
Write it like this
for(String string: stringArray)string=string.toLowerCase();
This is just as short. Also in a normal for loop for(int i=0;i<40;i++) you can use the comma operator to keep everything on one line
No, there isn't.
The trick with the enhanced-for loop is that it behaves like any other loop over a collection - you're working with the individual elements one at a time, as opposed to all at once.
Furthermore, since toLowerCase() returns a new String, as it should, it should only be called in situations where it's absolutely needed, as opposed to creating a new variable for that (unless you need it in more places, in which case it's better to move the lower-case functionality into those methods).
You should consider refactoring your code into several methods each with their own loops. One method creates a new array (or list) with transformed elements from the original list (such as applying toLowerCase() to the Strings in an array). The other methods process the new array rather than the original.
Unfortunately that's not possible. You could take a look at Google Guava, which has something like this (Predicates/Closures), but it doesn't help much in improving your code.
Cmpletely offtopic maybe, but it might help, if you would use Groovy, which is fully compatible with Java, it would be something like:
String[] stringArray = ["Lower", "Case"] as String[]
stringArray.collect { it.toLowerCase() }.each { item ->
println item
}
Which would print:
lower
case
But, like I said, this might not be a viable option in your case.
I don't think that's possible as of now. :)
I have created a class called Month that extends Calendar, which I am using to hold events for a given month. I have several years worth of Month objects stored in a TreeSet. Most of the events I want to record last for several months and are specified only by their start month (and a duration, in months). I want to be able to do this:
for ( Event e : events )
{
for ( Month aMonth : myMonths )
{
// check if this is a start month for e
// if it is, call aMonth.addEvent(e);
// and also add e to the next e.getDuration() months of myMonths
// and then start checking again from the month where we called addEvent(e)
// in case there is another occurrence of e starting
// before the first one has finished
}
}
It's the part in capitals I'm having trouble with. I tried using an iterator instead of a foreach loop, and having a separate loop when a start date was found which used the iterator to add e to the next x months, but I couldn't then get the iterator back to where it started. It looks like ListIterator has a previous() method, but I wanted to use a SortedSet rather than a List, to ensure duplicates are avoided (although perhaps this inclination is wrong?)
It feels like it would be much easier to do this with a plain old array, but the collection is useful for other parts of the program.
Perhaps I can use multiple iterators and just "use them up" as needed for these forays into months just beyond my main "bookmark" iterator? Doesn't exactly seem elegant though.
Or is there a hack for "peeking" beyond where my iterator is actually pointing?
I am quite new to all this, so I may just be making a design error. All advice welcomed!
What's do the Event and Month objects look like? I think what you had will work:
for(Event event : events) {
for(Month aMonth : myMonths) {
if(aMonth >= event.startMonth && aMonth <= event.startMonth+event.duration) {
aMonth.add(event);
}
}
}
Alternatively you could flip it around and go the other way, make your outer iterator the Months and your inner iterator the Events. That should work too, the if() condition would probably be the same.
Depending on how many events and months you're dealing with at a time, the naive approach may well be the best. I'd start by just letting your for loops handle the iterators, and accepting the fact that you'll be doing (m*n) iterations. Then if you find that this spot is causing a significant slow-down you can try a few other techniques to speed things up without making your code overly complex.
Trying to seek ahead and back will make your code difficult to understand and more prone to bugs, without necessarily gaining you much in terms of performance. Usually you won't notice a significant difference in performance until you're talking about at least hundreds of items in both collections (in which case you could start with something simple like breaking your data up into years, for example, to reduce the overhead of the double-nested for loops).
Edit
However, since I just can't help myself, here's a semi-elegant strategy that will take advantage of the fact that your events and months are both stored in ascending order (I'm assuming events are stored in order of their start date). It uses a LinkedList (which is very efficient at adding and removing elements from the front and back of the list) to keep track of which months the current event might span, and then breaks as soon as it finds a month that the event doesn't include:
LinkedList<Month> monthList = new LinkedList<Month>();
var i = monthList.getIterator();
for(Event ev : events)
{
shiftList(monthList, i, ev);
for(Month m : monthList)
{
if (!isInMonth(ev, m)) break;
m.addEvent(ev);
}
}
...
// Remove months that are not in scope from the front of the list.
// Add months that are in scope to the end of the list
public void shiftList(LinkedList<Month> monthList, Iterator<Month> i, Event ev)
{
while(!monthList.size() > 0 && !isInMonth(ev, monthList.getFirst()))
{
monthList.removeFirst();
}
while(i.hasNext() && isInMonth(ev, monthList.getLast()))
{
monthList.addLast(i.next());
}
}
Again, you can see how much more complicated this is: it's very likely I introduced a bug on this logic, and I wouldn't feel comfortable using this in production without thorough unit-testing. You're generally much better off just keeping it simple until you have a compelling reason to optimize.
Google's guava library provides a PeekingIterator that allows single-element peekahead. Create one via Iterators.peekingIterator(Iterator).