Compare array to text file - java

I have a text file: "example.csv" that contains 1000's of rows
The text file contains
Dog, 3123
Cat, 6544
Chicken, 8943
And another: "example2.csv" that contains 1000's of rows
Fruit, 3243
Banana, 9432
Chicken, 2043
And an array that contains (100's of rows):
Home, Dan, Dog, 4234
Home, Bug, Chicken, 3213
Home, Hds, Banana, 4324
Out, Bgh, Poodle, 3129
I need to change the third column in the array to the value found in the CSV, example.csv. If it is not found in the first one, it needs to look in the second one, example2.csv.
As you can see in my example, Chicken appears in both, but it needs to find the value from example.csv. However, Fruit only appears in the second, so it needs to find the value from example2.csv.
Anything not found in either csv needs to displayed that it does not exist.
Any example being:
Home, Dan, Dog, 4234
Home, Bug, Chicken, 3213
Out, Bgh, Poodle, 3129
BECOMES
Home, Dan, 3123, 4234
Home, Bug, 8943, 3213
Home, Hds, 9432, 4324
Out, Bgh, Poodle, 3129 : Display :Missing - Poodle
Thank you!
Number is the array and isFound starts off as false
Split by is: ","
Here is my attempt:
try {
//Scanner inFile = new Scanner (new FileReader("Files\\"+"ABTutor2018Sem2"+".csv")); // import file
brStudents = new BufferedReader(new FileReader("Files\\deploystudio.csv"));
brStudents2 = new BufferedReader(new FileReader("Files\\deploystudio2.csv"));
while ((line = brStudents.readLine()) != null) {
arrSubjectsDeploy = line.split(splitBy);
isFound = false;
for(int h = 0; h < l; h++) {
arrSubjects = Number[h].split(splitBy);
line2 = "";
errorMessage = arrSubjectsDeploy[0];
if(arrSubjects[2].equals(arrSubjectsDeploy[0])) {
arrSubjects[2] = arrSubjectsDeploy[1];
m++;
tempHolder = "";
isFound = true;
}
if(h == l-1 && isFound == false) {
System.out.println("Missing: " + errorMessage);
}
}
}
}
This is only trying to do the first text file ... but still does not work because it's looking at the contents of the textfile and seeing if it compares to the array, rather the the other way around.

You can use Map<String,String> to store contents of example.csv and example2.csv. Then iterate over your input array to check if it exists in either file.
Map<String, String> firstFile = readFileAndConvert("Files\\deploystudio.csv");
Map<String, String> secondFile = readFileAndConvert("Files\\deploystudio2.csv");
int index = 2;
String[][] finalArray =
{{"Home", "Dan", "Dog", "4234"}, {"Home", "Bug", "Chicken", "3213"},
{"Home", " Hds", "Banana", "4324"}, {"Out", " Bgh", "Poodle", "3129"}};
for(String[] row : finalArray) {
String valueToFind = row[index];
if(firstFile.containsKey(valueToFind)) {
row[index] = firstFile.get(valueToFind);
} else if(secondFile.containsKey(valueToFind)) {
row[index] = secondFile.get(valueToFind);
} else {
System.out.println("Missing " + valueToFind);
}
}
readFileAndConvert function using java 8. But you can achieve similar functionality with any other version too.
private Map<String, String> readFileAndConvert(String path) throws IOException {
String SEPARATOR = ",";
try (Stream<String> lines = Files.lines(Paths.get(path))) {
return lines.map(line -> line.split(SEPARATOR))
.collect(Collectors.toMap(array -> array[0], array -> array[1]));
}
}

Related

Join csv files ased on common column in java

I want to join two csv files based on a common column in. My two csv files and final csv file looks like this.
Here are the example files - 1st file looks like:
sno,first name,last name
--------------------------
1,xx,yy
2,aa,bb
2nd file looks like:
sno,place
-----------
1,pp
2,qq
Output:
sno,first name,last name,place
------------------------------
1,xx,yy,pp
2,aa,bb,qq
Code:
CSVReader r1 = new CSVReader(new FileReader("c:/csv/file1.csv"));;
CSVReader r2 = new CSVReader(new FileReader("c:/csv/file2.csv"));;
HashMap<String,String[]> dic = new HashMap<String,String[]>();
int commonCol = 1;
r1.readNext(); // skip header
String[] line = null;
while ((line = r1.readNext()) != null)
{
dic.put(line[commonCol],line)
}
commonCol = 1;
r2.readNext();
String[] line2 = null;
while ((line2 = r2.readNext()) != null)
{
if (dic.keySet().contains(line2[commonCol])
{
// append line to existing entry
}
else
{
// create a new entry and pre-pend it with default values
// for the columns of file1
}
}
foreach (String[] line : dic.valueSet())
{
// write line to the output file.
}
I don't know how to proceed further to get desired output. Any help will be appreciated.
Thanks
First, you need to use zero as your commonCol value as the first column has index zero rather than one.
if (dic.keySet().contains(line2[commonCol])
{
//Get the whole line from the first file.
String firstPart = dic.get(line2[commonCol]);
//Gets the line from the second file, without the common column.
String secondPart = String.join (Arrays.copyOfRange(line2, 1, line2.length -1), ",");
// Join together and put in Hashmap.
dic.put(line2[commonCol], String.join (firstPart, secondPart));
}
else
{
// create a new entry and pre-pend it with default values
// for the columns of file1
String firstPart = String.join(",","some", "default", "values")
String secondPart = String.join (Arrays.copyOfRange(line2, 1, line2.length -1), ",");
dic.put(line2[commonCol], String.join (firstPart, secondPart));
}

How to select random text value from specific row using java

I have three input fields.
First Name
Last item
Date Of Birth
I would like to get random data for each input from a property file.
This is how the property file looks. Field name and = should be ignored.
- First Name= Robert, Brian, Shawn, Bay, John, Paul
- Last Name= Jerry, Adam ,Lu , Eric
- Date of Birth= 01/12/12,12/10/12,1/2/17
Example: For First Name: File should randomly select one name from the following names
Robert, Brian, Shawn, Bay, John, Paul
Also I need to ignore anything before =
FileInputStream objfile = new FileInputStream(System.getProperty("user.dir "+path);
in = new BufferedReader(new InputStreamReader(objfile ));
String line = in.readLine();
while (line != null && !line.trim().isEmpty()) {
String eachRecord[]=line.trim().split(",");
Random rand = new Random();
//I need to pick first name randomly from the file from row 1.
send(firstName,(eachRecord[0]));
If you know that you're always going to have just those 3 lines in your property file I would get put each into a map with an index as the key then randomly generate a key in the range of the map.
// your code here to read the file in
HashMap<String, String> firstNameMap = new HashMap<String, String>();
HashMap<String, String> lastNameMap = new HashMap<String, String>();
HashMap<String, String> dobMap = new HashMap<String, String>();
String line;
while (line = in.readLine() != null) {
String[] parts = line.split("=");
if(parts[0].equals("First Name")) {
String[] values = lineParts[1].split(",");
for (int i = 0; i < values.length; ++i) {
firstNameMap.put(i, values[i]);
}
}
else if(parts[0].equals("Last Name")) {
// do the same as FN but for lastnamemap
}
else if(parts[0].equals("Date of Birth") {
// do the same as FN but for dobmap
}
}
// Now you can use the length of the map and a random number to get a value
// first name for instance:
int randomNum = ThreadLocalRandom.current().nextInt(0, firstNameMap.size(0 + 1);
System.out.println("First Name: " + firstNameMap.get(randomNum));
// and you would do the same for the other fields
The code can easily be refactored with some helper methods to make it cleaner, we'll leave that as a HW assignment :)
This way you have a cache of all your values that you can call at anytime and get a random value. I realize this isn't the most optimum solution having nested loops and 3 different maps but if your input file only contains 3 lines and you're not expecting to have millions of inputs it should be just fine.
Haven't programmed stuff like this in a long time.
Feel free to test it, and let me know if it works.
The result of this code should be a HashMap object called values
You can then get the specific fields you want from it, using get(field_name)
For example - values.get("First Name"). Make sure to use to correct case, because "first name" won't work.
If you want it all to be lower case, you can just add .toLowerCase() at the end of the line that puts the field and value into the HashMap
import java.lang.Math;
import java.util.HashMap;
public class Test
{
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
// set the value of "in" here, so you actually read from it
HashMap<String, String> values = new HashMap<String, String>();
String line;
while (((line = in.readLine()) != null) && !line.trim().isEmpty()) {
if(!line.contains("=")) {
continue;
}
String[] lineParts = line.split("=");
String[] eachRecord = lineParts[1].split(",");
System.out.println("adding value of field type = " + lineParts[0].trim());
// now add the mapping to the values HashMap - values[field_name] = random_field_value
values.put(lineParts[0].trim(), eachRecord[(int) (Math.random() * eachRecord.length)].trim());
}
System.out.println("First Name = " + values.get("First Name"));
System.out.println("Last Name = " + values.get("Last Name"));
System.out.println("Date of Birth = " + values.get("Date of Birth"));
}
}

How to to add values from a file into an array using split?

I have the following code:
BufferedReader metaRead = new BufferedReader(new FileReader(metaFile));
String metaLine = "";
String [] metaData = new String [100000];
while ((metaLine = metaRead.readLine()) != null){
metaData = metaLine.split(",");
for (int i = 0; i < metaData.length; i++)
System.out.println(metaData[0]);
}
This is what's in the file:
testTable2 Name java.lang.Integer TRUE test
testTable2 age java.lang.String FALSE test
testTable2 ID java.lang.Integer FALSE test
I want the array to have at metaData[0] testTable2, metaData[1] would be Name, but when I run it at 0 I get testtable2testtable2testtable2, and at 1 I'd get NameageID and OutOfBoundsException.
Any ideas what to do in order to get the result I want?
Just print metaData[i] instead of metaData[0] and split each string by "[ ]+" (that means "1 or more spaces"):
metaData = metaLine.split("[ ]+");
As a result, you will get the following arrays:
[testTable2, Name, java.lang.Integer, TRUE, test]
[testTable2, age, java.lang.String, FALSE, test]
[testTable2, ID, java.lang.Integer, FALSE, test]
The code snippet to the preceding output results:
while ((metaLine = metaRead.readLine()) != null) {
metaData = metaLine.split("[ ]+");
for (int i = 0; i < metaData.length; i++)
System.out.print(metaData[i] + " ");
System.out.println();
}
Also, I've written your task by using Java 8 and Stream API:
List<String> collect = metaRead
.lines()
.flatMap(line -> Arrays.stream(line.split("[ ]+")))
.collect(Collectors.toList());
And, finally, there is the most straight-forward way:
final int LINES, WORDS;
String[] metaData = new String[LINES = 5 * (WORDS = 3)]; // I don't like it
int i = 0;
while ((metaLine = metaRead.readLine()) != null) {
for (String s : metaLine.split("[ ]+")) metaData[i++] = s;
}
Correct your code following line inside the for loop,
System.out.println(metaData[0]);
As
System.out.println(metaData[i]);
Although my answer may not fit completely with your question. But as i can see, your file format is TSV or CSV.
May be you should consider using OpenCSV
for your problem.
The library will handle reading, splitting process for you.

Reading and matching contents of two big files

I have two files each having the same format with approximately 100,000 lines. For each line in file one I am extracting the second component or column and if I find a match in the second column of second file, I extract their third components and combine them, store or output it.
Though my implementation works but the programs runs extremely slow, it takes more than an hour to iterate over the files, compare and output all the results.
I am reading and storing the data of both files in ArrayList then iterate over those list and do the comparison. Below is my code, is there any performance related glitch or its just normal for such an operation.
Note : I was using String.split() but I understand form other post that StringTokenizer is faster.
public ArrayList<String> match(String file1, String file2) throws IOException{
ArrayList<String> finalOut = new ArrayList<>();
try {
ArrayList<String> data = readGenreDataIntoMemory(file1);
ArrayList<String> data1 = readGenreDataIntoMemory(file2);
StringTokenizer st = null;
for(String line : data){
HashSet<String> genres = new HashSet<>();
boolean sameMovie = false;
String movie2 = "";
st = new StringTokenizer(line, "|");
//String line[] = fline.split("\\|");
String ratingInfo = st.nextToken();
String movie1 = st.nextToken();
String genreInfo = st.nextToken();
if(!genreInfo.equals("null")){
for(String s : genreInfo.split(",")){
genres.add(s);
}
}
StringTokenizer st1 = null;
for(String line1 : data1){
st1 = new StringTokenizer(line1, "|");
st1.nextToken();
movie2 = st1.nextToken();
String genreInfo2= st1.nextToken();
//If the movie name are similar then they should have the same genre
//Update their genres to be the same
if(!genreInfo2.equals("null") && movie1.equals(movie2)){
for(String s : genreInfo2.split(",")){
genres.add(s);
}
sameMovie = true;
break;
}
}
if(sameMovie){
finalOut.add(ratingInfo+""+movieName+""+genres.toString()+"\n");
}else if(sameMovie == false){
finalOut.add(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return finalOut;
}
I would use the Streams API
String file1 = "files1.txt";
String file2 = "files2.txt";
// get all the lines by movie name for each file.
Map<String, List<String[]>> map = Stream.of(Files.lines(Paths.get(file1)),
Files.lines(Paths.get(file2)))
.flatMap(p -> p)
.parallel()
.map(s -> s.split("[|]", 3))
.collect(Collectors.groupingByConcurrent(sa -> sa[1], Collectors.toList()));
// merge all the genres for each movie.
map.forEach((movie, lines) -> {
Set<String> genres = lines.stream()
.flatMap(l -> Stream.of(l[2].split(",")))
.collect(Collectors.toSet());
System.out.println("movie: " + movie + " genres: " + genres);
});
This has the advantage of being O(n) instead of O(n^2) and it's multi-threaded.
Do a hash join.
As of now you are doing an outer loop join which is O(n^2), the hash join will be amortized O(n)
Put the contents of each file in a hash map, with key the field you want (second field).
Map<String,String> map1 = new HashMap<>();
// build the map from file1
Then do the hash join
for(String key1 : map1.keySet()){
if(map2.containsKey(key1)){
// do your thing you found the match
}
}

Dynamically creating a new instance in Java

I have a class called CD with the following private variables:
private String artist = "";
private String year = "";
private String albumName = "";
private ArrayList<String> songs = new ArrayList<String>();
This class is used to store input data that is in this format:
Led Zeppelin
1979 In Through the Outdoor
-In the Evening
-South Bound Saurez
-Fool in the Rain
-Hot Dog
-Carouselambra
-All My Love
-I'm Gonna Crawl
I have a CDParser class that is in charge of parsing the file called sample.db line by line to store it into our CD object. After parsing, the CD object, after initializing it with CD newCD = new CD() has the following structure:
artist = "Led Zeppelin"
year = "1979"
albumName = "In Through the Outdoor"
songs = {"-In the Evening", "-South Bound Saurez", "-Fool in the Rain", "-Hot Dog"}
Now.. For this project, sample.db contains many albums, which looks like the following:
Led Zeppelin
1979 In Through the Outdoor
-In the Evening
-South Bound Saurez
-Fool in the Rain
-Hot Dog
-Carouselambra
-All My Love
-I'm Gonna Crawl
Led Zeppelin
1969 II
-Whole Lotta Love
-What Is and What Should Never Be
-The Lemon Song
-Thank You
-Heartbreaker
-Living Loving Maid (She's Just a Woman)
-Ramble On
-Moby Dick
-Bring It on Home
Bob Dylan
1966 Blonde on Blonde
-Rainy Day Women #12 & 35
-Pledging My Time
-Visions of Johanna
-One of Us Must Know (Sooner or Later)
-I Want You
-Stuck Inside of Mobile with the Memphis Blues Again
-Leopard-Skin Pill-Box Hat
-Just Like a Woman
-Most Likely You Go Your Way (And I'll Go Mine)
-Temporary Like Achilles
-Absolutely Sweet Marie
-4th Time Around
-Obviously 5 Believers
-Sad Eyed Lady of the Lowlands
I have so far been able to parse all three different albums and save them into my CD object, but ran into a roadblock where I'm simply saving all three albums into the same newCD object.
My question is - is there a way of programmatically initialize my CD constructor that will follow the format newCD1, newCD2, newCD3, etc, as I parse the sample.db?
What this means is, as I parse this particular file:
newCD1 would be the album In Through the Outdoor (and its respective private vars)
newCD2 would be the album II (and its respective private vars)
newCD3 would be the album Blonde on Blonde, and so on
Is this a smart way to do it? Or could you suggest me a better way?
EDIT:
Attached is my parser code. ourDB is an ArrayList containing every line of sample.db:
CD newCD = new CD();
int line = 0;
for(String string : this.ourDB) {
if(line == ARTIST) {
newCD.setArtist(string);
System.out.println(string);
line++;
} else if(line == YEAR_AND_ALBUM_NAME){
String[] elements = string.split(" ");
String[] albumNameArr = Arrays.copyOfRange(elements, 1, elements.length);
String year = elements[0];
String albumName = join(albumNameArr, " ");
newCD.setYear(year);
newCD.setAlbumName(albumName);
System.out.println(year);
System.out.println(albumName);
line++;
} else if(line >= SONGS && !string.equals("")) {
newCD.setSong(string);
System.out.println(string);
line++;
} else if(string.isEmpty()){
line = 0;
}
}
You have a single CD object, so you keep overwriting it. Instead, You could hold a collection of CDs. E.g.:
List<CD> cds = new ArrayList<>();
CD newCD = new CD();
int line = 0;
for(String string : this.ourDB) {
if(line == ARTIST) {
newCD.setArtist(string);
System.out.println(string);
line++;
} else if(line == YEAR_AND_ALBUM_NAME){
String[] elements = string.split(" ");
String[] albumNameArr = Arrays.copyOfRange(elements, 1, elements.length);
String year = elements[0];
String albumName = join(albumNameArr, " ");
newCD.setYear(year);
newCD.setAlbumName(albumName);
System.out.println(year);
System.out.println(albumName);
line++;
} else if(line >= SONGS && !string.equals("")) {
newCD.setSong(string);
System.out.println(string);
line++;
} else if(string.isEmpty()){
// We're starting a new CD!
// Add the one we have so far to the list, and start afresh
cds.add(newCD);
newCD = new CD();
line = 0;
}
}
// Take care of the case the file doesn't end with a newline:
if (line != 0) {
cds.add(newCD);
}
The problem is that you're using the same object reference of CD to fill the values of the parse of the file.
Just make sure to initialize and store every instance of CD newCD every time you start parsing the content of a new album.
You may do the following:
List<CD> cdList = new ArrayList<>();
for (<some way to handle you're reading a new album entry from your file>) {
CD cd = new CD();
//method below parses the data in the db per album entry
//an album entry may contain several lines
parseData(cd, this.ourDB);
cdList.add(cd);
}
System.out.println(cdList);
Your current way to parse the file works but is not as readable as it should be. I would recommend using two loops:
List<CD> cdList = new ArrayList<>();
Iterator<String> yourDBIterator = this.ourDB.iterator();
//it will force to enter the first time
while (yourDBIterator.hasNext()) {
//do the parsing here...
CD cd = new CD();
//method below parses the data in the db per album entry
//an album entry may contain several lines
parseData(cd, yourDBIterator);
cdList.add(cd);
}
//...
public void parseData(CD cd, Iterator<String> it) {
String string = it.next();
int line = ARTIST;
while (!"".equals(string)) {
if (line == ARTIST) {
newCD.setArtist(string);
System.out.println(string);
line++;
} else if(line == YEAR_AND_ALBUM_NAME){
String[] elements = string.split(" ");
String[] albumNameArr = Arrays.copyOfRange(elements, 1, elements.length);
String year = elements[0];
String albumName = join(albumNameArr, " ");
newCD.setYear(year);
newCD.setAlbumName(albumName);
System.out.println(year);
System.out.println(albumName);
line++;
} else if(line >= SONGS && !string.equals("")) {
newCD.setSong(string);
System.out.println(string);
line++;
}
if (it.hasNext()) {
string = it.next();
} else {
string = "";
}
}
}
Then, your code
I suggest to use the Builder design pattern to construct the CD object. If you read lines always in the same order, it will be not complicated to implement and use. Good tutorial: http://www.javacodegeeks.com/2013/01/the-builder-pattern-in-practice.html

Categories