Building Object collection taking very long

Building Object collection taking very long - java

What I am trying to do is build a collection of UserObjects from an ArrayList<String> that I've read from a BufferedReader
UserObject simply consists of these fields:
int UserId
ArrayList<Integer> AssociatesId
My current code is using a BufferedReader to read in file.edgelist and building an ArrayList<String> which has entries of this format: "1 1200"
I am splitting that string into a String[] by its whitespace and building a new UserObject with UserId = 1 and initializing a new ArrayList<Integer> that holds any integers in the second element that has the same UserId
My problem is that file.edgelist has around 20,000,000 entries and while the BufferedReader takes under 10 seconds to read the file, it takes forever to build the collection of UserObjects. In fact, I haven't even gotten to the end of the file because it takes so long. I can confirm that I am successfully building these entries as I've run the code in debug and dropped an occasional breakpoint to find that the UserId is increasing and the UserObject's AssociatesId collections contain data.
Is there a quicker and/or better way to build this collection?
This is currently my code:
private ArrayList<UserObject> tempUsers;
public Utilities(){
tempUsers = new ArrayList<UserObject>();
}
//reading file through BufferedReader and returns ArrayList of strings formatted like "1 1200"
public ArrayList<String> ReadFile(){
BufferedReader reader = null;
ArrayList<String> userStr = new ArrayList<String>();
try {
File file = new File("file.edgelist");
reader = new BufferedReader(new FileReader(file));
String line;
while ((line = reader.readLine()) != null) {
userStr.add(line);
}
return userStr;
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
//Where the problem actually lies
public ArrayList<UserObject> BuildUsers(ArrayList<String> userStrings){
for (String s : userStrings){
String[] ids = s.split("\\s+");
UserObject exist = getUser(Integer.parseInt(ids[0]));
if (exist == null){ //builds new UserObject if it doesn't exist in tempUsers
UserObject newUser = new UserObject(Integer.parseInt(ids[0]));
newUser.associate(Integer.parseInt(ids[1]));
tempUsers.add(newUser);
} else{ //otherwise adds "associate" Id to UserObject's AssociatesId collection
exist.associate(Integer.parseInt(ids[1]));
}
}
return tempUsers;
}
//helper method that uses Stream to find and return existing UserObject
private UserObject getUser(int id){
if (tempUsers.isEmpty()) return null;
try{
return tempUsers.stream().filter(t -> t.equals(new UserObject(id))).findFirst().get();
} catch (NoSuchElementException ex){
return null;
}
}

Everytime you call getUser, you iterate through the whole list to check whether given user exist. This is very inefficient, as the size of the list is growing (linear complexity in the worst case). You might want to replace it with HashMap (the lookup has a constant complexity).
private Map<Integer, UserObject> tempUsers = new HashMap();
//helper method that uses Stream to find and return existing UserObject
private UserObject getUser(int id){
return users.get(id);
}
Moreover, creating intermediate ArrayList<String> userStr with 20,000,000 million of entries is completely unnecessary and wastes lots of memory. You should create UserObject instances as you read lines from the reader.

Wow, you are just wasting memory and performance there.
First, don't load the entire file into memory as a List<String>. That is just a total waste of memory. Load the file directly into UserObject objects.
Next, don't store them as List<UserObject> and perform a sequential search for object by id. That's just .... sllloooooooooowwwww....
You should store them in a Map<Integer, UserObject> for fast access by id.
Actually, you don't even need UserObject. From what you've said, you just need a Map<Integer, List<Integer>>, which is also called a MultiMap. It's simple enough to do yourself, or you can find third-party libraries with MultiMap implementations.
Also, don't use split() is you know each line will contain exactly 1 space. Use indexOf() and substring()

You code fits the definition of a "pipeline", and thus could benefit enormously from a more judicious usage of the Streams API. For example, you don't need to read the whole file into memory, just use Files.lines to get a Stream<String> with every line in the file. Furthermore, you could do your parsing like:
//Where the problem actually lies
public ArrayList<UserObject> BuildUsers(Stream<String> userStrings){
java.util.Map<Integer,UserObject> users = userStrings // Stream<String>
.map(str -> s.split("\\s+")) // Stream<String[]>
.map(ids -> {
UserObject newUser = new UserObject(Integer.parseInt(ids[0]));
newUser.associate(Integer.parseInt(ids[1]));
return newUser;
}) // Stream<UserObject>, all new (maybe with duplicated ids)
.collect(Collectors.groupingBy(
uObj -> uObj.getId(), // whatever returns the "ids[0]" value
java.util.HashMap::new,
Collectors.reducing((uo1, uo2) -> {
// This lambda "merges" uo2 into uo1
uo2.getAssociates().forEach(uo1::associate);
return uo1;
})));
return new ArrayList<>(users.values());
}
Where I've made up the "getId" and "getAssociates" functions in UserObject to return the values that came originally from the elements of the ids array. This function first splits each line into a String array, then parses each of those 2-element arrays into new UserObject instances. The final collectors perform two functions:
Grouping by the Id property, so you would get a Map<Integer,List<UserObject>> with all the UserObjects with the same primary id.
The reducing (squashing) the several UserObject instances with the same primary id into a single instance (per Collectors.reducing) so that in the end you actually get a Map<Integer,UserObject>. The function passed to reducing takes two UserObject instances and returns one that contains the associate IDs of both of its "parents".
Finally, since apparently you want an ArrayList with the values, the code just takes them from the map and dumps them into the desired container type.

Related

Retrieving Values from having a particular property from a Map using Java 8 Stream

I have a class UserCourseEntity with a property userId
#AllArgsConstructor
#Getter
#ToString
public static class UserCourseEntity {
private String userId;
}
And I have a map with UserCourseEntity objects as values.
public final Map<String, UserCourseEntity> userCourses;
Method getUserCoursesByUserID receives userId property of the UserCourseEntity as a parameter.
I want to check if there are values in the userCourses map having userId that matches the giving id in the case-insensitive manner (i.e. using equalsIgnoreCase()).
If there are such values, I need to store them into a list, and throw an exception otherwise.
I'm wonder is it possible to reimplement this code using streams?
public List<UserCourseEntity> getUserCoursesByUserID(String userId) {
List<UserCourseEntity> list = new ArrayList<>();
for (Map.Entry<String, UserCourseEntity> entry : userCourses.entrySet()) {
UserCourseEntity userCourseEntityValue = entry.getValue();
String key = entry.getKey();
boolean isExist = userCourseEntityValue.getUserId().equalsIgnoreCase(userId);
if (!isExist) {
continue;
} else {
if (userCourseEntityValue.getUserId().equalsIgnoreCase(userId)) {
list.add(userCourses.get(key));
}
}
}
if (list.isEmpty()) {
logger.error("No data found");
throw new SpecificException("No data found with the given details");
}
return list;
}

We can achieve it using streams.
For that, we need to create a stream over the map-entries. Filter the entries that have values with matching userId. That transform the stream by extracting the value from each entry and collect them into a list.
Note: there's no way to throw an exception from inside the stream, hence if-statement responsible for that remains on its place.
That's how it can be implemented:
public List<UserCourseEntity> getUserCoursesByUserID(String userId) {
List<UserCourseEntity> courses = userCourses.entrySet().stream()
.filter(entry -> entry.getValue().getUserId().equalsIgnoreCase(userId))
.map(Map.Entry::getValue)
.collect(Collectors.toList()); // or .toList() for Java 16+
if (courses.isEmpty()) {
logger.error("No data found");
throw new SpecificException("No data found with the given details");
}
return courses;
}
Sidenote: from the perspective of class design, it would be cleaner if you had a user object responsible for storing and manipulate information (retrieving and changing) regarding their courses.
And you can maintain a collection of user, for instance a HashMap associating id with a user. That would allow accessing a list of courses in a convenient way.
Iterating over the HashMaps entries ins't the best way of using it.

Importing a Large Amount of Data and Searching Efficiently

I'm currently writing a program that takes in two CSVs - one containing database keys (and other information irrelevant to the current issue), the other being an asset manifest. The program checks the database key from the first CSV, queries an online database to retrieve the asset key, then gets the asset status from the second CSV. (This is a workaround to a stupid API issue.)
My problem is that while the CSV that is being iterated over is relatively short - only about 300 lines long usually - the other is an asset manifest that is easily 10000 lines long (and sorted, though not by the key I can obtain from the first CSV). I obviously don't want to iterate over the entire asset manifest for every single input line, since that will take roughly 10 eternities.
I'm a fairly inexperienced programmer, so I only know of sorting/searching algorithms, and I definitely don't know what would be the one to use for this. What algorithm would be the most efficient? Is there a way to "batch-query" the manifest for all of the assets listed in the input CSV that would be faster than searching the manifest individually for each key? Or should I use a tree or hashtable or something else I heard mentioned in other SE threads? I don't know anything about the performance implications of any of these.
I can format the manifest as needed when it's input (it's just copy-pasted into a GUI), so I guess I could iterate over the entire manifest when it's input and make a hashtable of key:line pairs and then search that? Or I could turn it into a 2D array and just search the specified index? Those are all I can think of.
Problem is, I don't know how much time computer operations like that take, and if that would just waste time or actually improve performance.
P.s. I'm using Java for this currently since it's all I know, but if another language would be faster then I'm all ears.

The simple solution will be creating a HashMap, iterating through one of the files and add each line of that file to the HashMap(with corresponding key and value), then iterate through the other one and see if the created HashMap contains the key, if yes add the data to anotherHashMap, then after iteration return the second HashMap.
Imagine we have test1.csv file with the content such key,name,family as below:
5000,ehsan,tashkhisi
2,ali,lllll
3,amel,lllll
1,azio,skkk
And test2.csv file with the content such key,status like below:
1000,status1
1,status2
5000,status3
4000,status4
4001,status1
4002,status3
5,status1
We want to have output like this:
1 -> status2
5000 -> status3
Simple code will be like below:
Java 8 Stream:
private static Map<String, String> findDataInTwoFilesJava8() throws IOException {
Map<String, String> map =
Files.lines(Paths.get("/tmp/test2.csv")).map(a -> a.split(","))
.collect(Collectors.toMap((a -> a[0]), (a -> a[1])));
return Files.lines((Paths.get("/tmp/test1.csv"))).map(a -> a.split(","))
.filter(a -> map.containsKey(a[0]))
.collect(Collectors.toMap(a -> a[0], a -> map.get(a[0])));
}
Simple Java:
private static Map<String, String> findDataInTwoFiles() throws IOException {
String line;
Map<String, String> map = new HashMap<>();
BufferedReader br = new BufferedReader(new FileReader("/tmp/test2.csv"));
while ((line = br.readLine()) != null) {
String[] lienData = line.split(",");
map.put(lienData[0], lienData[1]);
}
Map<String, String> resultMap = new HashMap<>();
br = new BufferedReader(new FileReader("/tmp/test1.csv"));
while ((line = br.readLine()) != null) {
String key = line.split(",")[0];
if(map.containsKey(key))
resultMap.put(key, map.get(key));
}
return resultMap;
}

How do I handle duplicate keys in HashMaps in Java?

I have a HashMap. I wrote a method for reading files and printing them, but I just thought that HashMap does not tolerate duplicate keys in the file, so I need to do something about it, e.g. saving the same key but with some kind of a character in the end (like just _ or something like that so they differ from each other). I can't come up with the solution (maybe I could catch an exception of just write an if-block). Could you please help me?
public static HashMap<String, String> hashmapReader(File test3) {
HashMap<String, String> data = new HashMap<>();
try (BufferedReader hmReader = new BufferedReader(new FileReader(test3))) {
String line;
while ((line = hmReader.readLine()) != null) {
String[] columns = line.split("\t");
String key = columns[0];
String value = columns[1];
data.put(key, value);
}
} catch (Exception e) {
System.out.println("Something went wrong");
}
return data;
}

You can add a control on the key if it already exist in the HashMap data.
In order to do this you can use get(key) method of the HashMap Java Class which returns null if the key doesn't exist:
if(data.get(key) != null)
key = key + "_";
data.put(key, value); //adding the split line array to the ArrayList
If it already exists (didn't return null) then you can change his name by adding a character at the end, e.g. "_" as you said.
EDIT: The answer above mine pointed out to me a fact: "What if there are more than 2 identical keys?".
For this reason, I recommend following his solution instead of mine.

To achieve what you actually ask for:
Before your put line:
while (data.containsKey(key)) key += "_";
data.put(key, value);
This will keep on checking the map to see if key already exists, and if it does, it adds the _ to the end, and tries again.
You can do these two lines in one:
while (data.putIfAbsent(key, value) != null) key += "_";
This does basically the same, but it just avoids having to look up twice in the case that the key isn't found (and thus the value should be inserted).
However, consider whether this is actually the best thing to do: how will you then look up things by "key", if you've essentially made up the keys while reading them.
You can keep multiple values per key by using a value type which stores multiple values, e.g. List<String>.
HashMap<String, List<String>> data = new HashMap<>();
// ...
data.computeIfAbsent(key, k -> new ArrayList<>()).add(value);

How to parse a CSV with multiple key-value pairs?

I have a CSV in this format:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith","0x7a69","Tim Greaves"
"0x7a69","John Taylor","0x7a69","Brian Anthony"
"Apple","Steve Jobs","apple","Anthony Michael"
"Apple","Steve Jobs","apple","Brian Anthony"
"Apple","Tim Cook","apple","Tim Greaves"
...
I would like to parse this CSV (using Java) so that it becomes:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith, John Taylor","0x7a69","Tim Greaves, Brian Anthony"
"Apple","Steve Jobs, Tim Cook","apple","Anthony Michael, Brian Anthony, Tim Greaves"
Essentially I just want to condense the CSV so that there is one entry per account/company name.
Here is what I have so far:
String csvFile = "something.csv";
String line = "";
String cvsSplitBy = ",";
List<String> accountList = new ArrayList<String>();
List<String> nameList = new ArrayList<String>();
List<String> systemNameList = new ArrayList<String>();
List<String> salesList = new ArrayList<String>();
try (BufferedReader br = new BufferedReader(new FileReader(csvFile)))
{
while ((line = br.readLine()) != null) {
// use comma as separator
String[] csv = line.split(cvsSplitBy);
accountList.add(csv[0]);
nameList.add(csv[1]);
systemNameList.add(csv[2]);
salesList.add(csv[3]);
}
So I was thinking of adding them all to their own lists, then looping through all of the lists and comparing the values, but I can't wrap my head around how that would work. Any tips or words of advice are much appreciated. Thanks!

By analyzing your requirements you can get a better idea of the data structures to use. Since you need to map keys (account/company) to values (name/rep) I would start with a HashMap. Since you want to condense the values to remove duplicates you'll probably want to use a Set.
I would have a Map<Key, Data> with
public class Key {
private String account;
private String companyName;
//Getters/Setters/equals/hashcode
}
public class Data {
private Key key;
private Set<String> names = new HashSet<>();
private Set<String> reps = new Hashset<>();
public void addName(String name) {
names.add(name);
}
public void addRep(String rep) {
reps.add(rep);
}
//Additional getters/setters/equals/hashcode
}
Once you have your data structures in place, you can do the following to populate the data from your CSV and output it to its own CSV (in pseudocode)
Loop each line in CSV
Build Key from account/company
Try to get data from Map
If Data not found
Create new data with Key and put key -> data mapping in map
add name and rep to data
Loop values in map
Output to CSV

Well, I probably would create a class, let's say "Account", with the attributes "accountName", "fullName", "customerSystemName", "salesRep". Then I would define an empty ArrayList of type Account and then loop over the read lines. And for every read line I just would create a new object of this class, set the corresponding attributes and add the object to the list. But before creating the object I would iterate overe the already existing objects in the list to see whether there is one which already has this company name - and if this is the case, then, instead of creating the new object, just reset the salesRep attribute of the old one by adding the new value, separated by comma.
I hope this helps :)

My arraylist is only outputting the last value

I created a HashMap to store a text file with the columns of information. I compared the key to a specific name and stored the values of the HashMap into an ArrayList. When I try to println my ArrayList, it only outputs the last value and leaves out all the other values that match that key.
This isn't my entire code just my two loops that read in the text file, stores into the HashMap and then into the ArrayList. I know it has something to do with my loops.
Did some editing and got it to output, but all my values are displayed multiple times.
My output looks like this.
North America:
[ Anguilla, Anguilla, Antigua and Barbuda, Antigua and Barbuda, Antigua and Barbuda, Aruba, Aruba, Aruba,
HashMap<String, String> both = new HashMap<String, String>();
ArrayList<String> sort = new ArrayList<String>();
//ArrayList<String> sort2 = new ArrayList<String>();
// We need a try catch block so we can handle any potential IO errors
try {
try {
inputStream = new BufferedReader(new FileReader(filePath));
String lineContent = null;
// Loop will iterate over each line within the file.
// It will stop when no new lines are found.
while ((lineContent = inputStream.readLine()) != null) {
String column[]= lineContent.split(",");
both.put(column[0], column[1]);
Set set = both.entrySet();
//Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getKey().equals("North America"))
{
String value= (String) me.getValue();
sort.add(value);
}
}
}
System.out.println("North America:");
System.out.println(sort);
System.out.println("\n");
}

Map keys need to be unique. Your code is working according to spec.

if you need to have many values for a key, you may use
Map<key,List<T>>
here T is String (not only list you can use any collection)

Some things seems wrong with your code :
you are iterating on the Map EntrySet to get just one value (you could just use the following code :
if (both.containsKey("North America"))
sort.add(both.get("North America"));
it seems that you can have "North America" more than one time in your input file, but you are storing it in a Map, so each time you store a new value for "North America" in your Map, it will overwrite the current value
I don't know what the type of sort is, but what is printed by System.out.print(sort); is dependent of the toString() implementation of this type, and the fact that you use print() instead of println() may also create problems depending on how you run your program (some shells may not print the last value for instance).
If you want more help, you may want to provide us with the following things :
sample of the input file
declaration of sort
sample of output
what you want to obtain.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.