Find biggest prices among huge amount of CSV files - java

I have 100 CSV files with following content
name,price
book,12.4
bread,54.23
Each file show content in sorted by price order
I need to find 10 most expensive products through all these files.This is my code:
import org.apache.commons.io.FileUtils;
import org.junit.Assert;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import static java.util.stream.Collectors.toList;
public final class FindBiggest extends Assert {
static class Data {
public Data(String str) {
final String[] split = str.split(",");
this.name = split[0];
this.price = Float.parseFloat(split[1]);
}
private final String name;
private final float price;
}
#Test
public void test() throws Exception {
final List<File> files = Files.walk(Paths.get("/tmp/"))
.filter(Files::isRegularFile)
.filter(path -> path.toString().endsWith(".csv"))
.map(Path::toFile)
.collect(toList());
final List<Data> collect =
files.stream()
.map(FindBiggest::content)
.map(Data::new)
.sorted((o1, o2) -> Float.compare(o1.price, o2.price))
.limit(10)
.collect(toList());
System.out.println(collect);
}
private static String content(final File file) {
try {
return FileUtils.readFileToString(file, StandardCharsets.UTF_8);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
}
In case when I have a lot of csv files program throw UOM(Out of memory) how to implement program to sort content in all files non loading all data to memory ?

You'll need some sorted set limited by certain amount of items. Probably some 3rd party collections libraries provide it, otherwise you can make it somehow like this: Limited SortedSet. Important thing is that the method add of such sorted set must return false if collection is full and newly added element falls beyond the limit, and true otherwise.
Now, make a loop over the csv files. Inside the loop's body, read records from csv file and add them to the set until add returns false (it'd mean that collection is full and no new records from the current csv won't be larger then the current ones - time to proceed to next file).
When the loop is done, resulting set will be the answer.

Related

Cannot read the array length because "<local1>" is null

I am making a stock market simulator app in java, and there is an issue in the deleteHistoryFiles() method. It says that array is null. However, I have no idea what array this error is talking about.
Here's the code (I've deleted some methods to save space):
package stock.market.simulator;
import java.util.Random;
import java.text.DecimalFormat;
import java.util.Timer;
import java.util.TimerTask;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
public class StockMarketSimulator {
// Path to where the files are stored for rate history
// USE WHEN RUNNING PROJECT IN NETBEANS
//public static final String HISTORYFILEPATH = "src/stock/market/simulator/history/";
// Path to history files to be used when executing program through jar file
public static final String HISTORYFILEPATH = "history/";
public static void main(String[] args) throws IOException {
accountProfile accProfile = accountCreation();
stockProfile[][] stockProfile = createAllStocks();
deleteHistoryFiles(new File(HISTORYFILEPATH));
createHistoryFiles(stockProfile);
mainWindow window = new mainWindow(accProfile, stockProfile);
recalculationLoop(stockProfile, window);
}
// Procedure to create the history files
public static void createHistoryFiles(stockProfile[][] stocks) throws IOException {
String fileName;
FileWriter fileWriter;
for (stockProfile[] stockArray : stocks) {
for (stockProfile stock : stockArray) {
fileName = stock.getProfileName() + ".csv";
fileWriter = new FileWriter(HISTORYFILEPATH + fileName);
}
}
}
// Procedure to delete the history files
public static void deleteHistoryFiles(File directory) {
for (File file : directory.listFiles()) {
if (!file.isDirectory()) {
file.delete();
}
}
}
}
I got the same exception in exactly the same scenario. I tried to create an array of files by calling File.listFiles() and then iterating the array.
Got exception Cannot read the array length because "<local3>" is null.
Problem is that the path to the directory simply does not exist (my code was copied from another machine with a different folder structure).
I don't understand where is <local1> (sometimes it is <local3>) comes from and what does it mean?
It should be just like this: Cannot read the array length because the array is null.
Edit (answering comment) The sole interesting question in this question is what is a <local1>
My answer answers this question: <local1> is just an array created by File.listFiles() method. And an array is null because of the wrong path.

How do I change the value of a java Path object?

Below, I am trying to change the value of the Path object there using the setSoundPath() method. I cannot find any documentation to say this is possible.
I am trying to create a class that will create a copy of a file at a specified path and put the copy in the specified folder. I need to be able to change the name of the path though, because I want to create the sound object with an initial placeholder file path.
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.*;
import java.io.IOException;
import javafx.beans.property.StringProperty;
import javafx.beans.property.SimpleStringProperty;
class Scratch {
public static class Sound extends Object{
private Path there;
StringProperty tests = new SimpleStringProperty(this, "test", "");
public Sound(){
this.there = Paths.get("C:\\Users\\HNS1Lab.NETWORK\\Videos\\JuiceWRLD.mp3");
}
public void setSoundPath(String SoundPath) {
this.tests.setValue(SoundPath);
this.there = Paths.get(this.tests.toString());
}
}
public static void main(String[] args) {
Sound test = new Sound();
test.setSoundPath("C:\\Users\\HNS1Lab.NETWORK\\Music\\Meowing-cat-sound.mp3");
test.copySound();
System.out.println("Path: " + test.getSoundPath().toString());
}
}
They are immutable:
Implementations of this interface are immutable and safe for use by
multiple concurrent threads.
(from: https://docs.oracle.com/javase/7/docs/api/java/nio/file/Path.html)
You can create new Path objects that point to your path.

How do I save a time window in Flink to a text file?

I am starting to work in ApacheFlink in Java.
My goal is to consume an ApacheKafka theme in a one minute time window and that will apply very basic information and record the result of each window in a file.
So far I managed to apply a text transformation simplification to what I receive, I should use apply or process to write the file the result of the window I am somewhat lost.
This is my Code so far
package myflink;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import java.time.ZoneId;
import java.util.Date;
import java.util.Properties;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.shaded.akka.org.jboss.netty.channel.ExceptionEvent;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.functions.windowing.AllWindowFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.functions.windowing.WindowFunction;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import scala.util.parsing.json.JSONObject;
public class BatchJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
properties.setProperty("auto.offset.reset", "latest");
FlinkKafkaConsumer consumer = new FlinkKafkaConsumer("topic-basic-test", new SimpleStringSchema(), properties);
DataStream<String> data = env.addSource(consumer);
data.flatMap(new JSONparse()).timeWindowAll(Time.minutes(1))."NEXT ??" .print()
System.out.println("Hola usuario 2");
env.execute("Flink Batch Java API Skeleton");
}
public static class JSONparse implements FlatMapFunction<String, Tuple2<String, String>> {
#Override
public void flatMap(String s, Collector<Tuple2<String, String>> collector) throws Exception {
System.out.println(s);
s = s + "ACA PODES JUGAR NDEAH";
collector.collect(new Tuple2<String,String>("M",s));
}
}
}
If you want the result of each one minute window to go to its own file, you can look at using the StreamingFileSink with one minute buckets -- which should do what you are looking for, or come very close.
I think you'll actually end up with a directory for each window containing a file from each parallel instance of the window -- but as you using timeWindowAll, which does not operate in parallel, there will be only one file per bucket, unless the results are so large that the file rolls over.
By the way, doing JSON parsing in a FlatMap will perform rather poorly because this will end up instantiating a new parser for each event, which will in turn cause considerable GC activity. It would be better to use a RichFlatMap and create one parser in the open() method that you can reuse for each event. And even better still, use a JSONKeyValueDeserializationSchema rather than a SimpleStringSchema, and have the kafka connector handle the json parsing for you.

Path to read a sound out of my package

I have a Sons class which load and play sounds. And an adhd class which contain the main and uses this Sons class.
All my classes are in the package "adhd" and my sounds in the jar, are like this : 1.wav is in SoundN which is in the jar. (ADHD.jar/SoundN/1.wav).
When I run the code in Eclipse it works, but when I run the jar it doesn't. It is important for me to keep the sounds "loading" because I need my program to read my sounds quickly, as I am using timers. What do you suggest me to do?
Here is the code of my class Sons which load sounds in instances of singletons.
Sons
package adhd;
import java.io.File;
import java.io.IOException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.FloatControl;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.UnsupportedAudioFileException;
import java.applet.Applet;
import java.applet.AudioClip;
import java.net.URL;
public class Sons {
private static String PATH=null;
private static Sons instance;
private final Map<String, Clip> sons;
private boolean desactive;
Sons(String path) {
PATH = path;
sons = new HashMap<String, Clip>();
}
public void load(String nom) throws UnsupportedAudioFileException, IOException, LineUnavailableException {
Clip clip = AudioSystem.getClip();
clip.open(AudioSystem.getAudioInputStream(getClass().getResourceAsStream(PATH + nom)));
sons.put(nom, clip);
}
public void play(String son) {
if(!desactive) try {
Clip c = getSon(son);
c.setFramePosition(0);
c.start();
} catch(Exception e) {
System.err.println("Impossible to play the sound " + sound);
desactive = true;
}
}
}
Here is the adhd class which contain the main that uses sounds
Main class : adhd
public static void main(String[] args) {
Sons sonN= new Sons("/SoundN/");
try {
sonN.load("1.wav");
} catch (UnsupportedAudioFileException | IOException
| LineUnavailableException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
sonN.play("1.wav");
}
Here is also a picture of the tree
Now, thanks to the exception message, we know what the problem actually is. The problem is not that the sounds can't be loaded or aren't found in the jar. The problem is that, as the javadoc says:
These parsers must be able to mark the stream, read enough data to determine whether they support the stream, and, if not, reset the stream's read pointer to its original position. If the input stream does not support these operation, this method may fail with an IOException.
And the stream returned by Class.getResourceAsStream(), when the resource is loaded from a jar, doesn't support these operations. So what you could do is to read everything from the input stream into a byte array in memory, create a ByteArrayInputStream from this byte array, and pass that stream to AudioSystem.getAudioInputStream().
If loading everything in memory is not an option because the sound is really long (but then I guess you wouldn't put it in the jar), then you could write it to a temporary file, and then pass a FileInputStream to AudioSystem.getAudioInputStream().

NullPointerException when adding keys in BloomFilter

I used the hadoop apache to create a counting Bloom Filter. However I get a NullPointerException when I am trying to add keys in it. I tried to change the class structure in many ways but still I get the same result.
Here is the code I did:
package package_name;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import org.apache.hadoop.util.bloom.*;
public class CBF {
public static CountingBloomFilter CBF = new CountingBloomFilter();
public static void countingFilter (ArrayList<byte[]> CBF_Keys) throws IOException{
CBF_Keys= Keys.keyStringArray;
Iterator<byte[]> iter = CBF_Keys.iterator();
while (iter.hasNext()) {
byte[] temp = iter.next();
Key hadoop_key = new Key(temp, 2.0);
CBF.add(hadoop_key);
}
}
}
problem is CBF = new CountingBloomFilter(). We should use CountingBloomFilter(int vectorSize, int nbHash, int hashType) instead here, otherwise the HashFunction will not be constructed in parent class Filter.

Categories