Java 8 stream vs List [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have a set of private methods that are used in a main public method (that receive a list) of a class, these methods will mainly use java 8 classic stream operations as filter, map, count e.t.c. I am wondering if creating stream single time in public api and passing to others method instead of passing list have any performance benefits or considerations as .stream() is called single time.

Calling stream() or any intermediate operation would actually do nothing, as streams are driven by the terminal operation.
So passing a Stream internally from one method to another is not bad IMO, might make the code cleaner. But dont return a Stream externally from your public methods, return a List instead ( plz read the supplied comments, might not hold for all cases)
Also think of the case that applying filter for example and then collecting to a toList and then streaming again that filtered List to only map later is obviously a bad choice... You are collecting too soon, so dont chain methods like this even internally.

In general it's best to ask for what is actually needed by the method. If every time you receive a list you make it a stream, ask for the stream instead (streams can come from things other than lists). This enhances portability and reusability and lets consumers of your api know upfront the requirements.
Ask for exactly what you need, nothing more, nothing less.

Related

Java method return type, predefined Collection vs Collector? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I just have a question regarding code reusability in method return type.
In Java 8 there is the concept of collectors where the user will specify the type of collection the stream would return.
would it be beneficial if base retrieval methods accept the Collector parameter instead of returning a predefined Collection, say list.
the method with the predefined Collection will then pass the Collectors.toList() to the base retrieve method.
Because java.util.stream.Stream provides the functionality that you describe, there's little reason for any other class to do the same: instead of taking a Collector, collecting a stream into it, and returning the result, it makes much more sense to just return the stream to begin with: it's clearer, and it gives the caller much more flexibility than just the choice of collection type.
That said, in most cases you're better off just returning an appropriate collection type. You usually have a better sense than your clients of what collection-types make sense for a given API, and your clients don't usually care that much unless there's some reason that they would want to mutate the collection afterward. If you just return a Stream, you're giving your clients less information about what to expect (unless you then compensate for it by putting the information in the Javadoc, that they then have to read and make sense of).

Java 8 stream processing not fluent [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a problem with Java 8 streams, where the data is processed in sudden bulks, rather than when they are requested. I have a rather complex stream-flow which has to be parallelised because I use concat to merge two streams.
My issue stems from the fact that data seems to be parsed in large bulks minutes - and sometimes even hours - apart. I would expect this processing to happen as soon as the Stream reads incoming data, to spread the workload. Bulk processing seems counterintuitive in almost every way.
So, the question is why this bulk-collection occurs and how I can avoid it.
My input is a Spliterator of unknown size and I use a forEach as the terminal operation.
It’s a fundamental principle of parallel streams that the encounter order doesn’t have to match the processing order. This enables concurrent processing of items of sublists or subtrees while assembling a correctly ordered result, if necessary. This explicitly allows bulk processing and even makes it mandatory for the parallel processing of ordered streams.
This behavior is determined by the particular implementation of the Spliterator’s trySplit implementation. The specification says:
If this Spliterator is ORDERED, the returned Spliterator must cover a strict prefix of the elements
…
API Note:
An ideal trySplit method efficiently (without traversal) divides its elements exactly in half, allowing balanced parallel computation.
Why was this strategy fixed in the specification and not, e.g. an even/odd split?
Well, consider a simple use case. A list will be filtered and collected into a new list, thus the encounter order must be retained. With the prefix rule, it’s rather easy to implement. Split off a prefix, filter both chunks concurrently, afterwards, add the result of the prefix filtering to the new list, followed by adding the filtered suffix.
With an even odd strategy, that’s impossible. You may filter both parts concurrently, but afterwards, you don’t know how to join the results correctly unless you track each items position throughout the entire operation.
Even then, joining these geared items would be much more complicated than performing an addAll per chunk.
You might have noticed that this all applies only, if you have an encounter order that might have to be retained. If your spliterator doesn’t report an ORDERED characteristic, it is not required to return a prefix. Nevertheless, the default implementation you might have inherited by AbstractSpliterator is designed to be compatible with ordered spliterators. Thus, if you want a different strategy, you have to implement the split operation yourself.
Or you use a different way of implementing an unordered stream, e.g.
Stream.generate(()->{
LockSupport.parkNanos(TimeUnit.SECONDS.toNanos(1));
return Thread.currentThread().getName();
}).parallel().forEach(System.out::println);
might be closer to what you expected.

What are the things to be kept in mind when aiming for a good class design? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Yesterday I have attended interview in one Leading IT Service company. Technical interview was good, no issues, then I have moved to another set of round about Management, Design and Process. I have answered everything except the below question.
Question asked by interviewer:
Let say you are developing a class, which I am going to consume in my
class by extending that, what are the key points you keep in
mind? Ex, Class A, which has a method called "method A" returns a Collection,
let say "list". What are the precautions you will take?
My Answer: The following points I will consider, such as:
Class and method need to be public
Method 1 returns a list, then this needs to be generics. So we can avoid class cast exception
If this class will be accessed in a multi-threaded environment, the method needs to be synchronized.
But the interviewer wasn't convinced by my points. He was expecting a different answer from me but I am not able to get his thought process, what he was excepting.
So please provide your suggestions.
I would want you holding to design principles of Single Reaponsibility, Open/Close, and Dependency Injection. Keep it stateless, simple, and testable. Make sure it can be extended without needing to change.
But then, I wasn't interviewing you.
A few more points which haven't been mentioned yet would be:
Decent documentation for your class so that one doesn't have to dig too deep into your code to understand what functionality you offer and what are the gotchas.
Try extending your own class before handing it out to someone else. This way, you personally can feel the pain if you class is not well designed and thereby can improve it.
If you are returning a list or any collection, one important question you need to ask is, "can the caller modify the returned collection"? Or "is this returned list a direct representation of the internal state of your class?". In that case, you might want to return a copy to avoid callers messing up your internal state i.e. maintain proper encapsulation.
Plan about the visibility of methods. Draw an explicit line between public, protected, package private and private methods. Ensure that you don't expose any more than you actually want to. Removing features is hard. If something is missing from your well designed API, you can add it later. But you expose a slew of useless public methods, you really can't upgrade your API without deprecating methods since you never know who else is using it.
If you are returning a collection, the first thing you should think about is should I protect myself from the caller changing my internal state e.g.
List list = myObject.getList();
list.retainAll(list2);
Now I have all the elements in common between list1 and list2 The problem is that myObject may not expect you to destroy the contents of the list it returned.
Two common ways to fix this are to take a defensive copy or to wrap the collection with a Collections.unmodifiableXxxx() For extra paranoia, you might do both.
The way I prefer to get around this is to avoid returning the collection at all. You can return a count and a method to get the n-th value or for a Map return the keys and provide a getter, or you can allow a visitor to each element. This way you don't expose your collection or need a copy.
Question is very generic but i want to add few points:
Except the method which you want to expose make other methods and variable private. Whole point is keep visibility to minimum.
Where ever possible make it immutable, this will reduce overhead in mutithreaded environment.
You might want to evaluate if serializability is to be supported or not. If not then dont provide default constructor. And if serializable then do evaluate serialized proxy pattern.

Good practice when manipulating data in java [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Is it bad practice to directly manipulate data like:
Sorter.mergeSort(testData); //(testData is now sorted)
Or should I create A copy of the data and then manipulate and return that like:
sortedData = Sorter.mergeSort(testData); // (sortedData is now sorted and testData remains unsorted)?
I have several sorting methods and I want to be consistent in the way they manipulate data. With my insertionSort method I can directly work on the unsorted data. However, if I want to leave the unsorted data untouched then I would have to create a copy of the unsorted data in the insertionSort method and manipulate and return that (which seems rather unnecessary). On the other hand in my mergeSort method I need to create a copy of the unsorted data one way or another so I ended up doing something that also seems rather unnecessary as a work around to returning a new sortedList:
List <Comparable> sorted = mergeSortHelper(target);
target.clear();
target.addAll(sorted);`
Please let me know which is the better practice, thanks!
It depends whether you're optimising for performance or functional purity. Generally in Java functional purity is not emphasised, for example Collections.Sort sorts the list you give it (even though it's implemented by making an array copy first).
I would optimise for performance here, as that seems more like typical Java, and anyone who wants to can always copy the collection first, like Sorter.mergeSort(new ArrayList(testData));
The best practice is to be consistent.
Personally I prefer my methods to not modify the input parameters since it might not be appropriate in all situations (you're pushing the responsibility onto the end user to make a copy if they need to preserve the original ordering).
That being said, there are clear performance benefits of modifying the input (especially for large lists). So this might be appropriate for your application.
As long as the functionality is clear to the end user you're covered either way!
In Java I usually provide both options (when writing re-usable utility methods, anyway):
/** Return a sorted copy of the data from col. */
public List<T> mergeSort(Collection<T extends Comparable<T>> col);
/** Sort the data in col in place. */
public void mergeSortIn(List<T extends Comparable<T>> col);
I'm making some assumptions re the signatures and types here. That said, the Java norm is - or at least, has been* - generally to mutate state in place. This is often a dangerous thing, especially across API boundaries - e.g. changing a collection passed to your library by its 'client' code. Minimizing the overall state-space and mutable state in particular is often the sign of a well designed application/library.
It sounds like you want to re-use the same test data. To do that I would write a method that builds the test data and returns it. That way, if I need the same test data again in a different test (i.e. to test your mergeSort() / insertionSort() implementations on the same data), you just build and return it again. I commonly do exactly this in writing unit tests (in JUnit, for example).
Either way, if your code is a library class/method for other people to use you should document its behaviour clearly.
Aside: in 'real' code there shouldn't really be any reason to specify that merge sort is the implementation used. The caller should care what it does, not how it does it - so the name wouldn't usually be mergeSort(), insertionSort(), etc.
(*) In some of the newer JVM languages there has been a conscious movement away from mutable data. Clojure has NO mutable state at all as it is a pure functional programming language (at least in normal, single-threaded application development). Scala provides a parallel set of collection libraries that do not mutate the state of collections. This has major advantages in multi-threaded, multi-processor applications. This is not as time/space expensive as might be naively expected, due to the clever algorithms the collections use.
In your particular case, modifying the "actual" data is more efficient. You are sorting data, it is observed that its more efficient to work on sorted data rather than unsorted data. So, I don't see why you should keep the unsorted data. check out Why is it faster to process a sorted array than an unsorted array?
Mutable object should be manipulated in the functions. Like Arrays#sort
But immutable objects (like String), can only return the "new" objects. Like String#replace

Is using output parameters considered bad practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Every time I see a method were one of the parameters is an output parameter like
void addTokenErrorsToReport(List<String> tokens, Map<String, Integer> report)
I get the feeling that this is just plain wrong. From my point of view, parameters in general should be immutable, and not changed within a method. E.g., the above method could be rewritten to
Map<String, Integer> createTokenErrorsReport(List<String tokens)
The returned Map could then be merged with the original report Map.
Is this assumption right? Or are both versions equally acceptable?
As with most things, it's only "bad practice" if it leads to poorly functioning / unreadable / hard-to-maintain code or if you don't know why you're doing it.
In most cases using an output parameter doesn't have those effects.
In your addTokenErrorsToReport, it certainly is an appropriate approach. You are adding token errors to a report - the function needs to know the tokens it is adding and the report it is adding to. The function clearly performs precisely the operation it was designed to perform with no disadvantages.
If you were to take the createTokenErrorsReport approach, you would have to follow every call to it by inserting the new tokens in the existing report. If adding tokens to an existing report is a common operation, it most definitely makes sense to have a method that adds. That's not to say that createTokenErrorsReport shouldn't exist as well - if creating new reports from a token list is a common operation, then you would want a function that does that.
A great example of a good use of an output parameter is Collections.sort, which sorts a list in place. The performance hit of creating a new copy of the list and returning the sorted copy is avoided, while at the same time it does not limit you from creating a copy and sorting the copy if you want to.
Just use the best tool for the job and keep your code succinct.
How would you add something to the map in the second example? I think it would be bad practice if you have to pass an empty map that gets filled in addTokenErrorsToReport. But in this case: no, I don't think it's bad practice. How would you implement otherwise if you have several List<String> tokens that you want to process? I think the first example is the straightforward one.
I think it depends on where you come from (language). If you used to write c or c++, where you could use pointers as parameters, which is nice and practical, you could easily write code like your first example. I don't really think there is some kind of good or bad but just how your style of coding is.
I have seen this coding practice reasonably often and found it quite elegant. It allows you to 'return' multiple Objects.
For instance, in your above example, you could return an integer value corresponding to an error code.

Categories