I have a file of records, each row begins with a timestamp and then a few fields.. it implements Iterable
#SuppressWarnings("unchecked")
#Override
public <E extends MarkedPoint>
Stream<E>
stream()
{
return (Stream<E>) StreamSupport.stream(spliterator(), false);
}
I would like to implement with Lambda expression/streams API what is essentially not just a filter, but a mapping/accumulator that would merge neighboring records ( stream elements coming from the an Iterable interface ) having the same timestamp. I would need an interface that was something like this
MarkedPoint prevPoint = null;
void nextPoint(MarkedPoint in, Stream<MarkedPoint> inputStream, Stream<MarkedPoint> outputStream )
{
while ( prevPoint.time == in.time )
{
updatePrevPoint(in);
in = stream.next();
}
outputStream.emit(in);
prevPoint = in;
}
}
that is rough-pseudocode of what I imagine is close to some API as how it is supposed to be used.. can someone please point me towards the most straightforward way of implementing this stream transformation ? The resulting stream will be necessarily of the same or lesser number of elements as the input, as it is essentially a filter and and option transformation of records occuring at the same timestamp are encountered.
Thanks in advance
Streams don’t work like that; there can be only 1 terminating method (consumer). What you seem to be asking for is an on-the-fly reduction with a possible consumption of the next element(s) within your class. No dice with the standard stream API.
You could first create a list of un-merged lines, then create an iterator that peeks at the next elenent(s) and merges them before returning the next merged element.
Related
Is there a way to call the Java stream API to perform a function for all but the last elements of an Iterable and call another on the last without splitting it into two separate calls?
This would save on two passes on the array, one for splitting the array into it's head-array and tail-element, and another to iterate those two and apply a function.
My use case is calling repo.save(entity) on all but the last element and repo.saveAndFlush(entity) on the last.
Assume I have a Iterable<FooEntity> items;
I'm hoping for a solution along the lines of items.stream().???.profit(!)
Update:
Here is my class updated as per #jonsharpe 's comment:
public class FooWriter implements ItemWriter<FooEntityManifest> {
private final FooRepository fooRepo;
#PersistenceContext
private EntityManager em;
#Autowired
public FooWriter(FooRepository fooRepo) {
this.fooRepo = fooRepo;
}
#Override
public void write(List<? extends FooEntityManifest> items) {
items.forEach(fooEM -> {
FooEntity foo = fooEM.getChangedObject();
fooRepo.save(foo);
});
em.flush();
}
}
As i mentioned in the comments, I'm unsure whether this injects the correct EntityManager so would rather use the repo only. Are my concerns valid?
P.S. I realize that my collection interface is of List and not Iterable but I was wondering about this in a general sense.
The simplest solution is to treat all items equally like
items.forEach(fooEM -> fooRepo.save(fooEM.getChangedObject()));
em.flush();
If you want to treat the last element specially, the Stream API is not the right tool for the job. There are possible solutions, but they will be more complicated than using an other API.
E.g. considering that your starting point is a List:
if(!items.isEmpty()) {
int last = items.size()-1;
items.subList(0, last).forEach(fooEM -> fooRepo.save(fooEM.getChangedObject()));
fooRepo.saveAndFlush(items.get(last).getChangedObject());
}
You can use reduce to simulate such a behavior. E.g.:
list.stream()
.reduce((a, b) -> {
repo.save(a);
return b;
})
.ifPresent(x -> repo.saveAndFlush(x));
But, to be completely honest, this is quite clunky, and from a maintenance point of view, you might be better off using # jonrsharpe's suggestion in the comments - "In this example, why not .save all of them then .flush after"?
I want to know how to process and manage Tabular data stream in java programming.
consider there is a table of records has the scheme ( name, age, zip-code, disease) and the records are to be read and processed tuple by tuple in time as a stream. i want to manage these stream tuples to save the processed tuples with the the scheme ( age, zip- code, disease ) ( name attribute is supposed to be deleted )
for input example .. read Tuple 1 ( han, 25, 12548, flue) at Time t1
publish Tuple 1* ( 25, 12548, flue)
read Tuple 2 ( alex, 27, 12544, cancer) 1 at t2 .
output Tuple 2* (27, 12544, cancer).
.. and so on, Can anyone Help me?
Here are some suggestions for a framework you can base your final application on.
First, make classes to represent your input and output records. We'll call them InRecord and OutRecord for the sake of discussion, but you can give them whatever names make sense for you. Give them private fields to hold the necessary data and public getter/setter methods to access the data.
Second, define an interface for an input supplier; let's call it InputSupplier for this discussion. It will need to provide methods for setup (open()) and tear-down (close()) methods to be called at the start and end of processing, and a getNext() method that returns the next available InRecord. You'll need to decide how it indicate end-of-input: either define that getNext() will return null if
there are no more input records, or provide a hasNext() method to call which will return true or false to indicate if another input record is available.
Third, define an interface for an output consumer (OutputConsumer). You'll want to have open() and close() methods, as well as an accept(OutRecord) method.
With this infrastructure in place, you can write your processing method:
public void process(InputSupplier in, OutputConsumer out){
in.open();
out.open();
InRecord inrec;
while ((inrec = in.getNext()) != null){
OutRecord outrec = new OutRecord(in.getAge(), in.getZipCode(), in.getDisease());
out.accept(outrec);
}
out.close();
in.close();
}
Finally, write some "dummy" I/O classes, one that implements InputSupplier and another that implements OutputConsumer. For test purposes, your input supplier can just return a few hand-created records and your output consumer could just print on the console the output records you send it.
Then all you need is a main method to tie it all together:
public static void main(String[] args){
InputSupplier in = new TestInput();// our "dummy" input supplier class
OuputConsumer out = new TestOutput(); // our "dummy" output consumer
process(in, out);
}
For the "real" application you'd write a "real" input supplier class, still implementing the InputSupplier interface, that can read from from a database or an Excel file or whatever input source, and an new output consumer class, still implementing the OutputConsumer interface, that can take output records and store them into whatever appropriate format. Your processing logic won't have to change, because you coded it in terms of InputSupplier and OutputConsumer interfaces. Now just tweak main a bit and you've got your final app:
public static void main(String[] args){
InputSupplier in = new RealInput();// our "real" input supplier class
OuputConsumer out = new RealOutput(); // our "real" output consumer
process(in, out);
}
This question is general, but I feel it is best explained with a specific example. Let's say I have a directory with many nested sub directories and in some of those sub directories there are text files ending with ".txt". A sample structure could be:
dir1
dir2
file1.txt
dir3
file2.txt
file3.txt
I'd be interested if there were a way in Java to build a method that could be called to return the successive text files:
TextCrawler crawler = new TextCrawler(new File("dir1"));
File textFile;
textFile = crawler.nextFile(); // value is file1.txt
textFile = crawler.nextFile(); // value is file2.txt
textFile = crawler.nextFile(); // value is file3.txt
Here is the challenge: No internal list of all the text files can be saved in the crawler object. That is trivial. In that case you'd simply build into the initialization a method that recursively builds the list of files.
Is there a general way of pausing a recursive method so that when it is called again it returns to the specific point in the stack where it left? Or will we have to write something that is specific to each situation and solutions necessarily have to vary for file crawlers, org chart searches, recursive prime finders, etc.?
If you want a solution that works on any recursive function, you can accept a Consumer object. It may look something like this:
public void recursiveMethod(Consumer<TreeNode> func, TreeNode node){
if(node.isLeafNode()){
func.accept(node);
} else{
//Perform recursive call
}
}
For a bunch of files, it might look like this:
public void recursiveMethod(Consumer<File> func, File curFile){
if(curFile.isFile()){
func.accept(curFile);
} else{
for(File f : curFile.listFiles()){
recursiveMethod(func, f);
}
}
}
You can then call it with:
File startingFile;
//Initialize f as pointing to a directory
recursiveMethod((File file)->{
//Do something with file
}, startingFile);
Adapt as necessary.
I think the state should be saved while you return from your recursive function, then you need to restore the state as you call that recursive function again. There is no generic way to save such a state, however a template can probably be created. Something like this:
class Crawler<T> {
LinkedList<T> innerState;
Callback<T> callback;
constructor Crawler(T base,Callback<T> callback) {
innerState=new LinkedList<T>();
innerState.push(base);
this.callback=callback; // I want functions passed here
}
T recursiveFunction() {
T base=innerState.pop();
T result=return recursiveInner(base);
if (!result) innerState.push(base); // full recursion complete
return result;
}
private T recursiveInner(T element) {
ArrayList<T> c=callback.getAllSubElements(element);
T d;
for each (T el in c) {
if (innerState.length()>0) {
d=innerState.pop();
c.skipTo(d);
el=d;
if (innerState.length()==0) el=c.getNext();
// we have already processed "d", if full inner state is restored
}
T result=null;
if (callback.testFunction(el)) result=el;
if ((!result) && (callback.recursiveFunction(el))) result=recursiveInner(el); // if we can recurse on this element, go for it
if (result) {
// returning true, go save state
innerState.push(el); // push current local state to "stack"
return result;
}
} // end foreach
return null;
}
}
interface Callback<T> {
bool testFunction(T element);
bool recursiveFunction(T element);
ArrayList<t> getAllSubElements(T element);
}
Here, skipTo() is a method that modifies the iterator on c to point to provided element. Callback<T> is a means to pass functions to class to be used as condition checkers. Say "Is T a folder" for recursive check, "Is T a *.txt" for return check, and "getAllSubclassElements" should also belong here. The for each loop is fron lack of knowledge on how to work with modifiable iterators in Java, please adapt to actual code.
The only way I can think of that would meet your exact requirement would be to perform the recursive tree walk in a separate thread, and have that thread deliver results back to the main thread one at a time. (For simplicity you could use a bounded queue for the delivery, but it is also possible to implement is using wait / notify, a lock object and a single shared reference variable.)
In Python, for example, this would be a good fit for coroutines. Unfortunately, Java doesn't have a direct equivalent.
I should add that using threads is likely to incur significant overhead in synchronization and thread context switching. Using a queue will reduce them to a degree provided that rate of "producing" and "consuming" is well matched.
What is the correct way of using lambdas for a recursive method? I have been trying to write a depth-first-search recursive function for a Graph. I have tried implementing the Lambda version, but not sure if my implementation is the correct way of using it in a recursive function.
Outline of the code:
a) Old fashioned way
private void depthFirstSearch(final Graph graph, final int sourceVertex){
count++;
marked[sourceVertex]= true;
for(int vertex:graph.getAllVerticesConnectedTo(sourceVertex)){
if(marked[vertex]!=true){
edgeTo[vertex]=sourceVertex;
depthFirstSearch(graph,vertex);
}
}
}
b) Java 8 Lambdas way:
private void depthFirstSearchJava8(final Graph graph, final int sourceVertex){
count++;
marked[sourceVertex]= true;
StreamSupport.stream(graph.getAllVerticesConnectedTo(sourceVertex).spliterator(),false)
.forEach(vertex -> {
if(marked[vertex]!=true){
edgeTo[vertex]=sourceVertex;
depthFirstSearchJava8(graph,sourceVertex);
}
});
}
I have tried to write a lambda version as above but could not figure out the advantage it is providing as compared to the traditional way.
Thanks
Just because lambdas exist, this doesn't mean you have to use them everywhere.
You are looping over an iterable, without filtering or mapping or transforming anything (which are the typical use cases for lambdas).
The for loop does what you want in a one-liner. Therefore, lambdas should not be used here.
That's because there is no advantage, at least not in this case. Lambdas are useful when you want to create a small function to be used in just one place in the program, e.g. when passing the lambda as an argument for another function. If your lambda takes more than one line of code, you should reconsider the idea of using it.
You could rewrite your depthFirstSearch method as follows:
private void depthFirstSearchJava8(Graph graph, int sourceVertex){
count++;
marked[sourceVertex] = true;
graph.getAllVerticesConnectedTo(sourceVertex).stream()
.filter(vertex -> !marked[vertex])
.peek(vertex -> edgeTo[vertex] = sourceVertex)
.forEach(vertex -> depthFirstSearchJava8(graph, vertex));
}
This code assumes getAllVerticesConnectedTo() method returns a collection of integers. If it returns an array of integers instead, then use the following code:
private void depthFirstSearchJava8(Graph graph, int sourceVertex){
count++;
marked[sourceVertex] = true;
Arrays.stream(graph.getAllVerticesConnectedTo(sourceVertex))
.filter(vertex -> !marked[vertex])
.peek(vertex -> edgeTo[vertex] = sourceVertex)
.forEach(vertex -> depthFirstSearchJava8(graph, vertex));
}
In the first solution, I've used the Collection.stream() method to get a stream of connected vertices, while in the second one, I've used the Arrays.stream() method. Then, in both solutions, I've first used filter() to keep only non marked vertices and peek() to modify the edgeTo array. Finally, forEach() is used to terminate the stream by invoking depthFirstSearchJava8() method recursively.
Is there a way to return some value from within a for loop without jumping out of the loop?
I am implementing a static analysis tool where I have to analyze a list of methods (CFGs) in a for loop. The size of CFG list is not known in advance. Each method in the for loop has to return some value. As asked above, is there a way to do it in a loop without breaking the loop? One possible alternative comes in mind is that I can unroll the loop, assuming the maximum list size could be some fixed value. But this does not solve the problem completely. Any help would be appreciated.
code looks like below.
for(CFG cfg: cfgList)
{
val = analyze(cfg);
return val; //I want for loop not to stop here.
}
P.S. I cannot store the values in a list to return values later.
Edit1:
For example, consider following statements.
call method1();
st2;
st3;
...
This method1() can be any of five different methods. For all five possible options, I want to analyze each of them, return their values and analyze rest of the statements accordingly. So, I would analyze these 5 methods as below.
call method1-option1();
st2;
st3;
...
call method1-option2();
st2;
st3;
...
call method1-option3();
st2;
st3;
...
Hope, it helps in understanding the question.
No you can not return value from loop without jumping out of it. According to your need you have to save value in other list and you can return that list after finishing the loop.
In Java 8, you can do:
Iterator<AnalysisResult> lazyAnalysisResults = cfgList.stream()
.map(cfg -> analyze(cfg))
.iterator();
And then the Iterator will supply new analyzed results one at a time, without you needing to collect them all into a list first.
Prior to Java 8, if you want your transformation to be lazy, the best you can do is to implement an Iterator yourself:
public final class AnalyzingIterator extends Iterator<AnalysisResult> {
private final Iterator<CFG> iter;
public AnalyzingIterator(Iterator<CFG> iter) {
this.iter = iter;
}
#Override public boolean hasNext() {
return iter.hasNext();
}
#Override public AnalysisResult next() {
return analyze(iter.next());
}
#Override public boolean remove() {
throw new UnsupportedOperationException();
}
}
If you don't want to store results in a List and return it all together you can use callback mechanism.
Use analyze() to start a new thread passing cfg as well as reference to this. When processing is over make that processing thread call a callback method on your current instance / thread passing the analyzed value. Continue to do whatever you intend to do with this returned value in the callback method. And you don't have to alter your for loop.