Cannot iterate through CSV columns

Cannot iterate through CSV columns - java

I'm building a stock screener that applies a calculation through each column of a csv file. However, when I run the for loop, I only get one result back.
String path = "C:/Users/0/Desktop/Git/Finance/Data/NQ100.csv";
Reader buf = Files.newBufferedReader(Paths.get(path));
CSVParser parsed = new CSVParser(buf, CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase().withTrim());
// Parse tickers
Map<String, Integer> header = parsed.getHeaderMap();
List<String> tickerList = new ArrayList<>(header.keySet());
for (int x=1; x < tickerList.size(); x++) { <----------------------- PROBLEM
// Accessing closing price by Header names
List<Double> closeList = new ArrayList<>();
for (CSVRecord record : parsed) {
String stringClose = record.get(x);
Double close = Double.valueOf(stringClose);
closeList.add(close);
}
// Percentage Change
List<Double> pctList = new ArrayList<>();
for (int i=1; i < closeList.size(); i++) {
Double pct = closeList.get(i) / closeList.get(i-1) - 1;
pctList.add(pct);
}
// Statistics
Double sum = 0.0, var = 0.0, mean, sd, rfr, sr;
// Mean
for (Double num : pctList) sum += num;
mean = sum/pctList.size();
// Standard Deviation
for (Double num: pctList) var += Math.pow(num - mean, 2);
sd = Math.sqrt(var/pctList.size());
// Risk Free Rate
rfr = Math.pow((1+0.03),(1/252.0))-1;
// Sharpe Ratio
sr = Math.sqrt(252) * ((mean-rfr)/sd);
System.out.println(tickerList.get(x) + " " + sr);
}
My data looks like this:
,AAL,AAPL,ADBE
2007-10-25,26.311651,23.141403,47.200001
2007-10-26,26.273216,23.384495,47.0
2007-10-29,26.004248,23.43387,47.0
So I was expecting:
AAL XXX
AAPL XXX
ADBE XXX
But I got just:
AAL 0.3604941921663456
Would be grateful if you guys can help me find the problem!

You can iterate through Iterable in Java only once, in your case CSVParser parsed implements Iterable<CSVRecord>.
So you iterate through it only for the first time when you calculate statistics for AAL, during analyzing data for AAPL and ADBE it will be handled as an empty one.
You can handle this by introducing helper list init by the parsed, add next code (it is a one line solution of course e.g. in Java 8, but this option will work for earlier versions too) before the for cycle:
List<CSVRecord> records = new ArrayList<>();
for (CSVRecord record : parsed) {
records.add(record);
}
And change next line:
for (CSVRecord record : records) {
with:
for (CSVRecord record : parsed) {
For the CSV you've provided you will have next output then:
AAL -21.583101145880306
AAPL 23.417753561072438
ADBE -16.75343297000953

So here's a block of the code that work for me, if i understand your question, you only want to "read" each column and row from a csv file, hope helps.
br = new BufferedReader(new InputStreamReader(new FileInputStream(archivo), "UTF8"));
while ((line = br.readLine()) != null) {
if(a!=0){
String[] datos = line.split(cvsSplitBy);
System.out.println(datos[0] + " - " + datos[1] + " - " + datos[2]);
}
a++;
}

Related

Find duplicates in first column and take average based on third column

My issue here is I need to compute average time for each Id and compute average time of each id.
Sample data
T1,2020-01-16,11:16pm,start
T2,2020-01-16,11:18pm,start
T1,2020-01-16,11:20pm,end
T2,2020-01-16,11:23pm,end
I have written a code in such a way that I kept first column and third column in a map.. something like
T1, 11:16pm
but I could not able to compute values after keeping those values in a map. Also tried to keep them in string array and split into line by line. By same issue facing for that approach also.
**
public class AverageTimeGenerate {
public static void main(String[] args) throws IOException {
File file = new File("/abc.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(file))) {
while (true) {
String line = reader.readLine();
if (line == null) {
break;
}
ArrayList<String> list = new ArrayList<>();
String[] tokens = line.split(",");
for (String s: tokens) {
list.add(s);
}
Map<String, String> map = new HashMap<>();
String[] data = line.split(",");
String ids= data[0];
String dates = data[1];
String transactionTime = data[2];
String transactionStartAndEndTime = data[3];
String[] transactionIds = ids.split("/n");
String[] timeOfEachTransaction = transactionTime.split("/n");
for(String id : transactionIds) {
for(String time : timeOfEachTransaction) {
map.put(id, time);
}
}
}
}
}
}
Can anyone suggest me is it possible to find duplicates in a map and compute values in map, Or is there any other way I can do this so that the output should be like
`T1 2:00
T2 5:00'

I don't know what is your logic to complete the average time but you can save data in map for one particular transaction. The map structure can be like this. Transaction id will be the key and all the time will be in array list.
Map<String,List<String>> map = new HashMap<String,List<String>>();

You can do like this:
Map<String, String> result = Files.lines(Paths.get("abc.txt"))
.map(line -> line.split(","))
.map(arr -> {
try {
return new AbstractMap.SimpleEntry<>(arr[0],
new SimpleDateFormat("HH:mm").parse(arr[2]));
} catch (ParseException e) {
return null;
}
}).collect(Collectors.groupingBy(Map.Entry::getKey,
Collectors.collectingAndThen(Collectors
.mapping(Map.Entry::getValue, Collectors.toList()),
list -> toStringTime.apply(convert.apply(list)))));
for simplify I've declared two functions.
Function<List<Date>, Long> convert = list -> (list.get(1).getTime() - list.get(0).getTime()) / 2;
Function<Long, String> toStringTime = l -> l / 60000 + ":" + l % 60000 / 1000;

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));

If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.

Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

Compare Two CSV Files and Fetch Data

I have two csv files. One Master CSV File around 500000 records. Another DailyCSV file has 50000 Records.
The DailyCSV files misses few columns which has to be fetched from Master CSV File.
For example
DailyCSV File
id,name,city,zip,occupation
1,Jhon,Florida,50069,Accountant
MasterCSV File
id,name,city,zip,occupation,company,exp,salary
1, Jhon, Florida, 50069, Accountant, AuditFirm, 3, $5000
What I have to do is, read both files, match the records with ID, if ID is present in the master file, then i have to fetch company, exp, salary and write it to a new csv file.
How to achieve this.??
What I have done Currently
while (true) {
line = bstream.readLine();
lineMaster = bstreamMaster.readLine();
if (line == null || lineMaster == null)
{
break;
}
else
{
while(lineMaster != null)
readlineSplit = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String splitId = readlineSplit[4];
String[] readLineSplitMaster =lineMaster.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
String SplitIDMaster = readLineSplitMaster[13];
System.out.println(splitId + "|" + SplitIDMaster);
//System.out.println(splitId.equalsIgnoreCase(SplitIDMaster));
if (splitId.equalsIgnoreCase(SplitIDMaster)) {
String writeLine = readlineSplit[0] + "," + readlineSplit[1] + "," + readlineSplit[2] + "," + readlineSplit[3] + "," + readlineSplit[4] + "," + readlineSplit[5] + "," + readLineSplitMaster[15]+ "," + readLineSplitMaster[16] + "," + readLineSplitMaster[17];
System.out.println(writeLine);
pstream.print(writeLine + "\r\n");
}
}
}pstream.close();
fout.flush();
bstream.close();
bstreamMaster.close();

First of all, your current parsing approach will be painfully slow. Use a CSV parsing library dedicated for that to speed things up. With uniVocity-parsers you can process your 500K records in less than a second. This is how you can use it to solve your problem:
First let's define a few utility methods to read/write your files:
//opens the file for reading (using UTF-8 encoding)
private static Reader newReader(String pathToFile) {
try {
return new InputStreamReader(new FileInputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for reading at " + pathToFile, e);
}
}
//creates a file for writing (using UTF-8 encoding)
private static Writer newWriter(String pathToFile) {
try {
return new OutputStreamWriter(new FileOutputStream(new File(pathToFile)), "UTF-8");
} catch (Exception e) {
throw new IllegalArgumentException("Unable to open file for writing at " + pathToFile, e);
}
}
Then, we can start reading your daily CSV file, and generate a Map:
public static void main(String... args){
//First we parse the daily update file.
CsvParserSettings settings = new CsvParserSettings();
//here we tell the parser to read the CSV headers
settings.setHeaderExtractionEnabled(true);
//and to select ONLY the following columns.
//This ensures rows with a fixed size will be returned in case some records come with less or more columns than anticipated.
settings.selectFields("id", "name", "city", "zip", "occupation");
CsvParser parser = new CsvParser(settings);
//Here we parse all data into a list.
List<String[]> dailyRecords = parser.parseAll(newReader("/path/to/daily.csv"));
//And convert them to a map. ID's are the keys.
Map<String, String[]> mapOfDailyRecords = toMap(dailyRecords);
... //we'll get back here in a second.
This is the code to generate a Map from the list of daily records:
/* Converts a list of records to a map. Uses element at index 0 as the key */
private static Map<String, String[]> toMap(List<String[]> records) {
HashMap<String, String[]> map = new HashMap<String, String[]>();
for (String[] row : records) {
//column 0 will always have an ID.
map.put(row[0], row);
}
return map;
}
With the map of records, we can process your master file and generate the list of updates:
private static List<Object[]> processMasterFile(final Map<String, String[]> mapOfDailyRecords) {
//we'll put the updated data here
final List<Object[]> output = new ArrayList<Object[]>();
//configures the parser to process only the columns you are interested in.
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);
settings.selectFields("id", "company", "exp", "salary");
//All parsed rows will be submitted to the following RowProcessor. This way the bigger Master file won't
//have all its rows stored in memory.
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
// Incoming rows from MASTER will have the ID as index 0.
// If the daily update map contains the ID, we'll get the daily row
String[] dailyData = mapOfDailyRecords.get(row[0]);
if (dailyData != null) {
//We got a match. Let's join the data from the daily row with the master row.
Object[] mergedRow = new Object[8];
for (int i = 0; i < dailyData.length; i++) {
mergedRow[i] = dailyData[i];
}
for (int i = 1; i < row.length; i++) { //starts from 1 to skip the ID at index 0
mergedRow[i + dailyData.length - 1] = row[i];
}
output.add(mergedRow);
}
}
});
CsvParser parser = new CsvParser(settings);
//the parse() method will submit all rows to the RowProcessor defined above.
parser.parse(newReader("/path/to/master.csv"));
return output;
}
Finally, we can get the merged data and write everything to another file:
... // getting back to the main method here
//Now we process the master data and get a list of updates
List<Object[]> updatedData = processMasterFile(mapOfDailyRecords);
//And write the updated data to another file
CsvWriterSettings writerSettings = new CsvWriterSettings();
writerSettings.setHeaders("id", "name", "city", "zip", "occupation", "company", "exp", "salary");
writerSettings.setHeaderWritingEnabled(true);
CsvWriter writer = new CsvWriter(newWriter("/path/to/updates.csv"), writerSettings);
//Here we write everything, and get the job done.
writer.writeRowsAndClose(updatedData);
}
This should work like a charm. Hope it helps.
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

I will approach the problem in a step by step manner.
First I will parse/read the master CSV file and keep its content into a hashmap, where the key will be each record's unique 'id' as for the value maybe you can store them in a hash or simply create a java class to store the information.
Example of hash:
{
'1' : { 'name': 'Jhon',
'City': 'Florida',
'zip' : 50069,
....
}
}
Next, read your comparer csv file. For each row, read the 'id' and check if the key exists on the hashmap you have created earlier.
if it exists, then from the hashmap access the information you need and write to a new CSV file.
Also, you might want to consider using a 3rd party CSV parser to make this task easier.
If you have maven maybe you can follow this example I found on net. Otherwise you can just google for apache 'csv parser' example on the internet.
http://examples.javacodegeeks.com/core-java/apache/commons/csv-commons/writeread-csv-files-with-apache-commons-csv-example/

How to extract key phrases from a given text with OpenNLP?

I'm using Apache OpenNLP and i'd like to extract the Keyphrases of a given text. I'm already gathering entities - but i would like to have Keyphrases.
The problem i have is that i can't use TF-IDF cause i don't have models for that and i only have a single text (not multiple documents)
Here is some code (prototyped - not so clean)
public List<KeywordsModel> extractKeywords(String text, NLPProvider pipeline) {
SentenceDetectorME sentenceDetector = new SentenceDetectorME(pipeline.getSentencedetecto("en"));
TokenizerME tokenizer = new TokenizerME(pipeline.getTokenizer("en"));
POSTaggerME posTagger = new POSTaggerME(pipeline.getPosmodel("en"));
ChunkerME chunker = new ChunkerME(pipeline.getChunker("en"));
ArrayList<String> stopwords = pipeline.getStopwords("en");
Span[] sentSpans = sentenceDetector.sentPosDetect(text);
Map<String, Float> results = new LinkedHashMap<>();
SortedMap<String, Float> sortedData = new TreeMap(new MapSort.FloatValueComparer(results));
float sentenceCounter = sentSpans.length;
float prominenceVal = 0;
int sentences = sentSpans.length;
for (Span sentSpan : sentSpans) {
prominenceVal = sentenceCounter / sentences;
sentenceCounter--;
String sentence = sentSpan.getCoveredText(text).toString();
int start = sentSpan.getStart();
Span[] tokSpans = tokenizer.tokenizePos(sentence);
String[] tokens = new String[tokSpans.length];
for (int i = 0; i < tokens.length; i++) {
tokens[i] = tokSpans[i].getCoveredText(sentence).toString();
}
String[] tags = posTagger.tag(tokens);
Span[] chunks = chunker.chunkAsSpans(tokens, tags);
for (Span chunk : chunks) {
if ("NP".equals(chunk.getType())) {
int npstart = start + tokSpans[chunk.getStart()].getStart();
int npend = start + tokSpans[chunk.getEnd() - 1].getEnd();
String potentialKey = text.substring(npstart, npend);
if (!results.containsKey(potentialKey)) {
boolean hasStopWord = false;
String[] pKeys = potentialKey.split("\\s+");
if (pKeys.length < 3) {
for (String pKey : pKeys) {
for (String stopword : stopwords) {
if (pKey.toLowerCase().matches(stopword)) {
hasStopWord = true;
break;
}
}
if (hasStopWord == true) {
break;
}
}
}else{
hasStopWord=true;
}
if (hasStopWord == false) {
int count = StringUtils.countMatches(text, potentialKey);
results.put(potentialKey, (float) (Math.log(count) / 100) + (float)(prominenceVal/5));
}
}
}
}
}
sortedData.putAll(results);
System.out.println(sortedData);
return null;
}
What it basically does is giving me the Nouns back and sorting them by prominence value (where is it in the text?) and counts.
But honestly - this doesn't work soo good.
I also tried it with lucene analyzer but the results were also not so good.
So - how can i achieve what i want to do? I already know of KEA/Maui-indexer etc (but i'm afraid i can't use them because of GPL :( )
Also interesting? Which other algorithms can i use instead of TF-IDF?
Example:
This text: http://techcrunch.com/2015/09/04/etsys-pulling-the-plug-on-grand-st-at-the-end-of-this-month/
Good output in my opinion: Etsy, Grand St., solar chargers, maker marketplace, tech hardware

Finally, i found something:
https://github.com/srijiths/jtopia
It is using the POS from opennlp/stanfordnlp. It has an ALS2 license. Haven't measured precision and recall yet but it delivers great results in my opinion.
Here is my code:
Configuration.setTaggerType("openNLP");
Configuration.setSingleStrength(6);
Configuration.setNoLimitStrength(5);
// if tagger type is "openNLP" then give the openNLP POS tagger path
//Configuration.setModelFileLocation("model/openNLP/en-pos-maxent.bin");
// if tagger type is "default" then give the default POS lexicon file
//Configuration.setModelFileLocation("model/default/english-lexicon.txt");
// if tagger type is "stanford "
Configuration.setModelFileLocation("Dont need that here");
Configuration.setPipeline(pipeline);
TermsExtractor termExtractor = new TermsExtractor();
TermDocument topiaDoc = new TermDocument();
topiaDoc = termExtractor.extractTerms(text);
//logger.info("Extracted terms : " + topiaDoc.getExtractedTerms());
Map<String, ArrayList<Integer>> finalFilteredTerms = topiaDoc.getFinalFilteredTerms();
List<KeywordsModel> keywords = new ArrayList<>();
for (Map.Entry<String, ArrayList<Integer>> e : finalFilteredTerms.entrySet()) {
KeywordsModel keyword = new KeywordsModel();
keyword.setLabel(e.getKey());
keywords.add(keyword);
}
I modified the Configurationfile a bit so that the POSModel is loaded from the pipeline instance.

Aggregate data in CSV file using Java

I have a big CSV file, thousands of rows, and I want to aggregate some columns using java code.
The file in the form:
1,2012,T1
2,2015,T2
3,2013,T1
4,2012,T1
The results should be:
T, Year, Count
T1,2012, 2
T1,2013, 1
T2,2015, 1

Put your data to a Map like structure, each time add +1 to a stored value when a key (in your case ""+T+year) found.

You can use map like
Map<String, Integer> rowMap = new HashMap<>();
rowMap("T1", 1);
rowMap("T2", 2);
rowMap("2012", 1);
or you can define your own class with T and Year field by overriding hashcode and equals method. Then you can use
Map<YourClass, Integer> map= new HashMap<>();
T1,2012, 2

String csv =
"1,2012,T1\n"
+ "2,2015,T2\n"
+ "3,2013,T1\n"
+ "4,2012,T1\n";
Map<String, Integer> map = new TreeMap<>();
BufferedReader reader = new BufferedReader(new StringReader(csv));
String line;
while ((line = reader.readLine()) != null) {
String[] fields = line.split(",");
String key = fields[2] + "," + fields[1];
Integer value = map.get(key);
if (value == null)
value = 0;
map.put(key, value + 1);
}
System.out.println(map);
// -> {T1,2012=2, T1,2013=1, T2,2015=1}

Use uniVocity-parsers for the best performance. It should take 1 second to process 1 million rows.
CsvParserSettings settings = new CsvParserSettings();
settings.selectIndexes(1, 2); //select the columns we are going to read
final Map<List<String>, Integer> results = new LinkedHashMap<List<String>, Integer>(); //stores the results here
//Use a custom implementation of RowProcessor
settings.setRowProcessor(new AbstractRowProcessor() {
#Override
public void rowProcessed(String[] row, ParsingContext context) {
List<String> key = Arrays.asList(row); // converts the input array to a List - lists implement hashCode and equals based on their values so they can be used as keys on your map.
Integer count = results.get(key);
if (count == null) {
count = 0;
}
results.put(key, count + 1);
}
});
//creates a parser with the above configuration and RowProcessor
CsvParser parser = new CsvParser(settings);
String input = "1,2012,T1"
+ "\n2,2015,T2"
+ "\n3,2013,T1"
+ "\n4,2012,T1";
//the parse() method will parse and submit all rows to your RowProcessor - use a FileReader to read a file instead the String I'm using as example.
parser.parse(new StringReader(input));
//Here are the results:
for(Entry<List<String>, Integer> entry : results.entrySet()){
System.out.println(entry.getKey() + " -> " + entry.getValue());
}
Output:
[2012, T1] -> 2
[2015, T2] -> 1
[2013, T1] -> 1
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Cannot iterate through CSV columns - java

Related

Find duplicates in first column and take average based on third column

How to select random text value from specific row using java

Compare Two CSV Files and Fetch Data

How to extract key phrases from a given text with OpenNLP?

Aggregate data in CSV file using Java

Categories

Resources