How do I read strings from a text file and store in a hashmap?
File contains two columns.
File is like:
FirstName LastName
Pranay Suyash and so on...
Here's one way:
import java.io.*;
import java.util.*;
class Test {
public static void main(String[] args) throws FileNotFoundException {
Scanner scanner = new Scanner(new FileReader("filename.txt"));
HashMap<String, String> map = new HashMap<String, String>();
while (scanner.hasNextLine()) {
String[] columns = scanner.nextLine().split(" ");
map.put(columns[0], columns[1]);
}
System.out.println(map);
}
}
Given input:
somekey somevalue
someotherkey someothervalue
this prints
{someotherkey=someothervalue, somekey=somevalue}
If your lines look differently, I either suggest you fetch columns[0] and columns[1] and do your string manipulation as needed, or, if you're comfortable with regular expressions, you could use Pattern / Matcher to match the line against a pattern and get the content from the capture groups.
In the hash map if you want to map each row in the two columns you can make the first column value as the key and the second column as the value. But the keys should be unique in the Hashmap. If the first column values are unique you can go for the following approach
Map<String,String> map = new HashMap<String,String>();
map.put(firstColVal,secondColVal);
Just in case
your keys (first column) don't contain spaces and
your columns are separated by either a :, a = or a white char (except newline)
then this may work:
Map<Object, Object> map = new Properties();
((Properties) map).load(new FileReader("inputfile.txt"));
Just saw your sample input... You shouldn't put that data in a map, unless it is guaranteed that all firstnames are unique.
Otherwise this will happen:
map.put("Homer", "Simpson"); // new key/value pair
map.put("Bart", "Simpson"); // new key/value pair
map.put("Homer", "Johnsson"); // value for "Homer" is replaced with "Johnsson"
System.out.println(map.get("Homer")); // guess what happens..
Related
I am asked to create a word vector space from a csv file. So I need to extract words and their vectors(the size is 57) to a dictionary for being able to reuse it for my futur operations.
My csv format is giving me a lot of problems because it's basically a text with key and doubles all separated by spaces and i wasn't able to separate correctly string and double parts until now.
So do you have any idea how to parse this file into a dictionary which contains (key, vector) type of entries.
Thanks a lot.
Here is a demonstration of csv file:
key1 4.0966564 7.963437 -2.1844673 1.9319566 -0.04495791 2.454401 3.1006012 -0.3813638 1.567303 -2.2067556 3.44506744 -4.382278 4.1457844 2.342756 -2.7707205 3.5015 2.5717492 -2.6846366...
key2 -3.968007 0.86151505 0.06163538 1.918614 0.34340435 -1.5178788 1.3857365 0.230331 0.7025755 -2.6575062 -0.7426953 3.1636698 2.8441591 0.4522623 3.3907628 2.425691 -1.2052362....
.
.
.
This data structure is called a multi-map: a key can have multiple values.
You can find examples in libraries.
If you'd rather not have the dependency, and wish to write your own, it might look like this:
public class MultiMap {
private Map<String, List<Double>> multi = new HashMap<>();
public void put(String key, Double newValue) {
if (newValue != null) {
List<Double> values = (this.multi.containsKey(key) ? this.multi.get(key) : new ArrayList<>());
values.add(newValue);
this.multi.put(key, values);
}
}
}
It's possible to use generics, but I'm too lazy to bother right now. This example is correct for your narrow use case.
Split each line into tokens by splitting at regex "\\s+". The first value is the key; iterate over all the others to add them to the multi-map.
You can do something like that :
String line = "key1 4.0966564 7.963437";
String[] parts = line.split(" ");
String key = parts[0];
ArrayList<Double> values = new ArrayList<Double>();
for(int i =1; i < parts.length; i++){
String doubleAsString = parts[i];
values.add(Double.valueOf(doubleAsString));
}
And then add this elements to your map.
So I'm trying retrieve specific substrings in values in a Hashmap constructed like this..
HashMap<ID, "Home > Recipe > Main Dish > Chicken > Chicken Breasts">
Which is passed from a different method that returns a HashMap
In above example, I need to retrieve Chicken.
Thus far, I have..
public static ArrayList<String> generalize() {
HashMap<String, String> items = new HashMap<>();
ArrayList<String> cats = new ArrayList<>();
items = RecSys.readInItemProfile("PATH", 0, 1);
for(String w : items.values()) {
cats.add(w);
}
for(String w : cats) {
int e = w.indexOf('>', 1 + w.indexOf('>', 1 + w.indexOf('>')));
String k = w.substring(e+1);
System.out.print(k);
e = 0;
}
System.out.println("k" + cats);
return cats;
}
Where I try to nullify String e for each iteration (I know it's redundant but it was just to test).
In my dataset, the first k-v pair is
3880=Home > Recipes > Main Dish > Pasta,
My output is
Pasta
Which is ok. If there are more than 3x ">", it'll return all following categories. Optimally it wouldn't do that, but it's ok if it does. However, further down the line, it (seemingly) randomly returns
Home > Recipe
Along with the rest of the data...
This happens at the 6th loop, I believe.
Any help is greatly appreciated..
Edit:
To clarify, I have a .csv file containing 3 columns, whereas 2 are used in this function (ID and Category). These are passed to this function by a read method in another class.
What I need to do is extract a generalized description of each category, which in all cases is the third instance of category specification (that is, always between the third and fourth ">" in every k-v pair).
My idea was to simply put all values in an arraylist, and for every value extract a string from between the third and fourth ">".
I recommend using the following map:
Map<Integer, List> map = new HashMap<>();
String[] vals = new String[] { "HomeRecipe", "Main Dish", "Chicken",
"Chicken Breasts" };
map.put(1, Arrays.asList(vals));
Then, if you need to find a given value in your original string using an ID, you can simply call ArrayList#get() at a certain position. If you don't care at all about order, then a map of integers to sets might make more sense here.
If you can. change your data structure to a HashMap<Integer, List<String>> or HashMap<Integer, String[]>. It's better to store the categories (by cats you mean categories right?) in a collection instead of a string.
Then you can easily get the third item.
If this is not possible. You need to do some debugging. Start by printing every input and output pair and find out which input caused the unexpected output. Your indexOf method seems to work at first glance.
Alternatively, try this regex method:
String k = cats.replaceAll("(?:[^>]+\\s*>\\s*){3}([^>]+).*", "$1");
System.out.println(k);
The regex basically looks for a xxx > yyy > zzz > aaa ... pattern and replaces that pattern with aaa (whatever that is in the original string).
I have a CSV in this format:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith","0x7a69","Tim Greaves"
"0x7a69","John Taylor","0x7a69","Brian Anthony"
"Apple","Steve Jobs","apple","Anthony Michael"
"Apple","Steve Jobs","apple","Brian Anthony"
"Apple","Tim Cook","apple","Tim Greaves"
...
I would like to parse this CSV (using Java) so that it becomes:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith, John Taylor","0x7a69","Tim Greaves, Brian Anthony"
"Apple","Steve Jobs, Tim Cook","apple","Anthony Michael, Brian Anthony, Tim Greaves"
Essentially I just want to condense the CSV so that there is one entry per account/company name.
Here is what I have so far:
String csvFile = "something.csv";
String line = "";
String cvsSplitBy = ",";
List<String> accountList = new ArrayList<String>();
List<String> nameList = new ArrayList<String>();
List<String> systemNameList = new ArrayList<String>();
List<String> salesList = new ArrayList<String>();
try (BufferedReader br = new BufferedReader(new FileReader(csvFile)))
{
while ((line = br.readLine()) != null) {
// use comma as separator
String[] csv = line.split(cvsSplitBy);
accountList.add(csv[0]);
nameList.add(csv[1]);
systemNameList.add(csv[2]);
salesList.add(csv[3]);
}
So I was thinking of adding them all to their own lists, then looping through all of the lists and comparing the values, but I can't wrap my head around how that would work. Any tips or words of advice are much appreciated. Thanks!
By analyzing your requirements you can get a better idea of the data structures to use. Since you need to map keys (account/company) to values (name/rep) I would start with a HashMap. Since you want to condense the values to remove duplicates you'll probably want to use a Set.
I would have a Map<Key, Data> with
public class Key {
private String account;
private String companyName;
//Getters/Setters/equals/hashcode
}
public class Data {
private Key key;
private Set<String> names = new HashSet<>();
private Set<String> reps = new Hashset<>();
public void addName(String name) {
names.add(name);
}
public void addRep(String rep) {
reps.add(rep);
}
//Additional getters/setters/equals/hashcode
}
Once you have your data structures in place, you can do the following to populate the data from your CSV and output it to its own CSV (in pseudocode)
Loop each line in CSV
Build Key from account/company
Try to get data from Map
If Data not found
Create new data with Key and put key -> data mapping in map
add name and rep to data
Loop values in map
Output to CSV
Well, I probably would create a class, let's say "Account", with the attributes "accountName", "fullName", "customerSystemName", "salesRep". Then I would define an empty ArrayList of type Account and then loop over the read lines. And for every read line I just would create a new object of this class, set the corresponding attributes and add the object to the list. But before creating the object I would iterate overe the already existing objects in the list to see whether there is one which already has this company name - and if this is the case, then, instead of creating the new object, just reset the salesRep attribute of the old one by adding the new value, separated by comma.
I hope this helps :)
I have two files as input , First file will be having role number and subject1 mark and second file will have role number and subject2 mark, First file will be coming in spark streaming and second file will be in my hdfs, How i can split the file like key, value pair and extract the value and store it in a variable using java in spark, I tried but having difficulty in extracting and storing as integer in a variable using javapairrdd. Thanks in advance for the help.
JavaRDD<String> sub1MarksRDD = sc.textFile("/user/ubuntu/sub1Marks.dat");
List<String> ccList = new ArrayList<String>();
ccList = sub1MarksRDD.collect();
JavaRDD<String> sub2MarksRDD = sc.textFile("/user/ubuntu/sub2marks.dat");
JavaPairRDD<String, Integer> result = sub1MarksRDD.mapToPair(
new PairFunction<String,String,Integer>() {
public Tuple2<String, Integer> call(String w) {
return new Tuple2<String, Integer>(w, 1);
}
}
);
How should we go ahead to create a pair rdd to map the role no,marks1 from sub1Marks.dat with the data in sub2Marks.dat. How to extract the marks fields based on the role no and store it to a variable.
I created a HashMap to store a text file with the columns of information. I compared the key to a specific name and stored the values of the HashMap into an ArrayList. When I try to println my ArrayList, it only outputs the last value and leaves out all the other values that match that key.
This isn't my entire code just my two loops that read in the text file, stores into the HashMap and then into the ArrayList. I know it has something to do with my loops.
Did some editing and got it to output, but all my values are displayed multiple times.
My output looks like this.
North America:
[ Anguilla, Anguilla, Antigua and Barbuda, Antigua and Barbuda, Antigua and Barbuda, Aruba, Aruba, Aruba,
HashMap<String, String> both = new HashMap<String, String>();
ArrayList<String> sort = new ArrayList<String>();
//ArrayList<String> sort2 = new ArrayList<String>();
// We need a try catch block so we can handle any potential IO errors
try {
try {
inputStream = new BufferedReader(new FileReader(filePath));
String lineContent = null;
// Loop will iterate over each line within the file.
// It will stop when no new lines are found.
while ((lineContent = inputStream.readLine()) != null) {
String column[]= lineContent.split(",");
both.put(column[0], column[1]);
Set set = both.entrySet();
//Get an iterator
Iterator i = set.iterator();
// Display elements
while(i.hasNext()) {
Map.Entry me = (Map.Entry)i.next();
if(me.getKey().equals("North America"))
{
String value= (String) me.getValue();
sort.add(value);
}
}
}
System.out.println("North America:");
System.out.println(sort);
System.out.println("\n");
}
Map keys need to be unique. Your code is working according to spec.
if you need to have many values for a key, you may use
Map<key,List<T>>
here T is String (not only list you can use any collection)
Some things seems wrong with your code :
you are iterating on the Map EntrySet to get just one value (you could just use the following code :
if (both.containsKey("North America"))
sort.add(both.get("North America"));
it seems that you can have "North America" more than one time in your input file, but you are storing it in a Map, so each time you store a new value for "North America" in your Map, it will overwrite the current value
I don't know what the type of sort is, but what is printed by System.out.print(sort); is dependent of the toString() implementation of this type, and the fact that you use print() instead of println() may also create problems depending on how you run your program (some shells may not print the last value for instance).
If you want more help, you may want to provide us with the following things :
sample of the input file
declaration of sort
sample of output
what you want to obtain.