I am asked to create a word vector space from a csv file. So I need to extract words and their vectors(the size is 57) to a dictionary for being able to reuse it for my futur operations.
My csv format is giving me a lot of problems because it's basically a text with key and doubles all separated by spaces and i wasn't able to separate correctly string and double parts until now.
So do you have any idea how to parse this file into a dictionary which contains (key, vector) type of entries.
Thanks a lot.
Here is a demonstration of csv file:
key1 4.0966564 7.963437 -2.1844673 1.9319566 -0.04495791 2.454401 3.1006012 -0.3813638 1.567303 -2.2067556 3.44506744 -4.382278 4.1457844 2.342756 -2.7707205 3.5015 2.5717492 -2.6846366...
key2 -3.968007 0.86151505 0.06163538 1.918614 0.34340435 -1.5178788 1.3857365 0.230331 0.7025755 -2.6575062 -0.7426953 3.1636698 2.8441591 0.4522623 3.3907628 2.425691 -1.2052362....
.
.
.
This data structure is called a multi-map: a key can have multiple values.
You can find examples in libraries.
If you'd rather not have the dependency, and wish to write your own, it might look like this:
public class MultiMap {
private Map<String, List<Double>> multi = new HashMap<>();
public void put(String key, Double newValue) {
if (newValue != null) {
List<Double> values = (this.multi.containsKey(key) ? this.multi.get(key) : new ArrayList<>());
values.add(newValue);
this.multi.put(key, values);
}
}
}
It's possible to use generics, but I'm too lazy to bother right now. This example is correct for your narrow use case.
Split each line into tokens by splitting at regex "\\s+". The first value is the key; iterate over all the others to add them to the multi-map.
You can do something like that :
String line = "key1 4.0966564 7.963437";
String[] parts = line.split(" ");
String key = parts[0];
ArrayList<Double> values = new ArrayList<Double>();
for(int i =1; i < parts.length; i++){
String doubleAsString = parts[i];
values.add(Double.valueOf(doubleAsString));
}
And then add this elements to your map.
Related
I have a CSV in this format:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith","0x7a69","Tim Greaves"
"0x7a69","John Taylor","0x7a69","Brian Anthony"
"Apple","Steve Jobs","apple","Anthony Michael"
"Apple","Steve Jobs","apple","Brian Anthony"
"Apple","Tim Cook","apple","Tim Greaves"
...
I would like to parse this CSV (using Java) so that it becomes:
"Account Name","Full Name","Customer System Name","Sales Rep"
"0x7a69","Mike Smith, John Taylor","0x7a69","Tim Greaves, Brian Anthony"
"Apple","Steve Jobs, Tim Cook","apple","Anthony Michael, Brian Anthony, Tim Greaves"
Essentially I just want to condense the CSV so that there is one entry per account/company name.
Here is what I have so far:
String csvFile = "something.csv";
String line = "";
String cvsSplitBy = ",";
List<String> accountList = new ArrayList<String>();
List<String> nameList = new ArrayList<String>();
List<String> systemNameList = new ArrayList<String>();
List<String> salesList = new ArrayList<String>();
try (BufferedReader br = new BufferedReader(new FileReader(csvFile)))
{
while ((line = br.readLine()) != null) {
// use comma as separator
String[] csv = line.split(cvsSplitBy);
accountList.add(csv[0]);
nameList.add(csv[1]);
systemNameList.add(csv[2]);
salesList.add(csv[3]);
}
So I was thinking of adding them all to their own lists, then looping through all of the lists and comparing the values, but I can't wrap my head around how that would work. Any tips or words of advice are much appreciated. Thanks!
By analyzing your requirements you can get a better idea of the data structures to use. Since you need to map keys (account/company) to values (name/rep) I would start with a HashMap. Since you want to condense the values to remove duplicates you'll probably want to use a Set.
I would have a Map<Key, Data> with
public class Key {
private String account;
private String companyName;
//Getters/Setters/equals/hashcode
}
public class Data {
private Key key;
private Set<String> names = new HashSet<>();
private Set<String> reps = new Hashset<>();
public void addName(String name) {
names.add(name);
}
public void addRep(String rep) {
reps.add(rep);
}
//Additional getters/setters/equals/hashcode
}
Once you have your data structures in place, you can do the following to populate the data from your CSV and output it to its own CSV (in pseudocode)
Loop each line in CSV
Build Key from account/company
Try to get data from Map
If Data not found
Create new data with Key and put key -> data mapping in map
add name and rep to data
Loop values in map
Output to CSV
Well, I probably would create a class, let's say "Account", with the attributes "accountName", "fullName", "customerSystemName", "salesRep". Then I would define an empty ArrayList of type Account and then loop over the read lines. And for every read line I just would create a new object of this class, set the corresponding attributes and add the object to the list. But before creating the object I would iterate overe the already existing objects in the list to see whether there is one which already has this company name - and if this is the case, then, instead of creating the new object, just reset the salesRep attribute of the old one by adding the new value, separated by comma.
I hope this helps :)
I have a test.csv file that is formatted as:
Home,Owner,Lat,Long
5th Street,John,5.6765,-6.56464564
7th Street,Bob,7.75,-4.4534564
9th Street,Kyle,4.64,-9.566467364
10th Street,Jim,14.234,-2.5667564
I have a hashmap that reads a file that contains the same header contents such as the CSV, just a different format, with no accompanying data.
In example:
Map<Integer, String> container = new HashMap<>();
where,
Key, Value
[0][NULL]
[1][Owner]
[2][Lat]
[3][NULL]
I have also created a second hash map that:
BufferedReader reader = new BufferedReader (new FileReader("test.csv"));
CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
Boolean headerParsed = false;
CSVRecord headerRecord = null;
int i;
Map<String,String> value = new HashMap<>();
for (final CSVRecord record : parser) {
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
for (i =0; i< record.size(); i++) {
value.put (headerRecord.get(0), record.get(0));
}
}
I want to read and compare the hashmap, if the container map has a value that is in the value map, then I put that value in to a corresponding object.
example object
public DataSet (//args) {
this.home
this.owner
this.lat
this.longitude
}
I want to create a function where the data is set inside the object when the hashmaps are compared and when a value map key is equal to a contain map key, and the value is placed is set into the object. Something really simply that is efficient at handling the setting as well.
Please note: I made the CSV header and the rows finite, in real life, the CSV could have x number of fields(Home,Owner,Lat,Long,houseType,houseColor, ect..), and a n number of values associated to those fields
First off, your approach to this problem is too unnecessarily long. From what I see, all you are trying to do is this:
Select a two columns from a CSV file, and add them to a data structure. I highlighted the word two because in a map, you have a key and a value. One column becomes the key, and the other becomes the value.
What you should do instead:
Import the names of columns you wish to add to the data structure into two strings. (You may read them from a file).
Iterate over the CSV file using the CSVParser class that you did.
Store the value corresponding to the first desired column in a string, repeat with the value corresponding to the second desired column, and push them both into a DataSet object, and push the DataSet object into a List<DataSet>.
If you prefer to stick to your way of solving the problem:
Basically, the empty file is supposed to hold just the headers (column names), and that's why you named the corresponding hash map containers. The second file is supposed to contain the values and hence you named the corresponding hash map values.
First off, where you say
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
you probably mean to say
if (!headerParsed) {
headerRecord = record;
headerParsed = true;
}
and where you say
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(0), record.get(0));
}
you probably mean
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(i), record.get(i));
}
i.e. You iterate over one record and store the value corresponding to each column.
Now I haven't tried this code on my desktop, but since the for loop also iterates over Home and Longitude, I think it should create an error and you should add an extra check before calling value.put (i.e. value.put("Home", "5th Street") should create an error I suppose). Wrap it inside an if conditional and check of the headerRecord(i) even exists in the containers hash map.
for (i =0; i< record.size(); i++) {
if (container[headerRecord.get(i)] != NULL) {
value.put(headerRecord.get(i), record.get(i));
}
}
Now thing is, that the data structure itself depends on which values from the containers hash map you want to store. It could be Home and Lat, or Owner and Long. So we are stuck. How about you create a data structure like below:
struct DataSet {
string val1;
string val2;
}
Also, note that this DataSet is only for storing ONE row. For storing information from multiple rows, you need to create a Linked List of DataSet.
Lastly, the container file contains ALL the column names. Not all these columns will be stored in the Data Set (i.e. You chose to NULL Home and Long. You could have chosen to NULL Owner and Lat), hence the header file is not what you need to make this decision.
If you think about it, just iterate over the values hash map and store the first value in string val1 and the second value in val2.
List<DataSet> myList;
DataSet row;
Iterator it = values.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
row.val1 = pair.getKey();
row.val2 = pair.getValue();
myList.add(row);
it.remove();
}
I hope this helps.
I'm trying to convert a PHP script into a Java one but coming across a few issues on a foreach loop. In the PHP script I have a foreach that takes the key:value pair and based off this does a str_replace.
foreach ($pValues AS $vKey => $vValue)
$vString = str_replace("{".$vKey."}", "'".$vValue."'", $vString);
I tried replicating this in Java without success. I need to get the key from the array to use in the string replace function, but can't find out where or if it's possible to get the key name from the array passed in.
Is this the right way or am I completely off? Should I be using the ImmutablePair method?
for (String vKey : pValues)
// String replace
Here's hoping there is an easy way to get the key:value pair in Java.
This can be acheived by using Map as data structure and then using entryset for iterating over it.
Map<K,V> entries= new HashMap<>();
for(Entry<K,V> entry : entries.entrySet()){
// you can get key by entry.getKey() and value by entry.getValue()
// or set new value by entry.setValue(V value)
}
That is not possible with a simple foreach-loop in Java.
If pValues is an array, you could use a simple for-loop:
for (int i = 0; i < pValues.length; i++)
// String replace
If pValues is a Map, you can iterate through it like this:
for (Key key : map.keySet())
string.replace(key, map.get(key));
Group Totals
Have the function GroupTotals(strArr) read in the strArr parameter containing key:value pairs where the key is a string and the value is an integer. Your program should return a string with new key:value pairs separated by a comma such that each key appears only once with the total values summed up.
For example: if strArr is ["B:-1", "A:1", "B:3", "A:5"] then your program should return the string A:6,B:2.
Your final output string should return the keys in alphabetical order. Exclude keys that have a value of 0 after being summed up.
Thanks all for the help and advice, I've managed to duplicate the function in Java using Map.
if (pValues != null)
{
Set vSet = pValues.entrySet();
Iterator vIt = vSet.iterator();
while(vIt.hasNext())
{
Map.Entry m =(Map.Entry)vIt.next();
vSQL = vSQL.replace("{" + (String)m.getKey() + "}", "'" + (String)m.getValue() + "'");
vSQL = vSQL.replace("[" + (String)m.getKey() +"]", (String)m.getValue());
}
}
I am trying to get some values from config file. I have lot of keys and want to get only certain values. These values have keys starting with same initial name with a slight variation towards the end.
can Someone help me quickly?
assuming when you say key you mean value (as in values in an array),
final String PREFIX = "yourPrefix";
for(String value : valueList) {
if(value.startwith(PREFIX)) {
<do whatever...>
}
here is the link to the java Doc
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#startsWith(java.lang.String)
I am assuming you are scanning the config file for Strings that have similar prefixes. Why not try scanning them in grouped instead of scanning them in all in one hashmap. If you know already the specified prefixes try creating an arraylist for each prefix and while scanning receive the given prefix and add it accordingly.
StringTokenizer s = new StringTokenizer ("Configuration File : Server_intenties = keyId_11503, keyId_11903 : Server_passcodes = keyCode_1678, keyCode_9893", " ");
ArrayList<String> keyCode = new ArrayList();
ArrayList<String> keyId = new ArrayList();
while(s.hasMoreTokens){
String key = s.nextToken
if(key.contains("keyId")){
keyId.add(key);
}
if(key.contains("keyCode")){
keyCode.add(key);
}
}
System.out.println(keyCode);
System.out.println(keyId);
How do I read strings from a text file and store in a hashmap?
File contains two columns.
File is like:
FirstName LastName
Pranay Suyash and so on...
Here's one way:
import java.io.*;
import java.util.*;
class Test {
public static void main(String[] args) throws FileNotFoundException {
Scanner scanner = new Scanner(new FileReader("filename.txt"));
HashMap<String, String> map = new HashMap<String, String>();
while (scanner.hasNextLine()) {
String[] columns = scanner.nextLine().split(" ");
map.put(columns[0], columns[1]);
}
System.out.println(map);
}
}
Given input:
somekey somevalue
someotherkey someothervalue
this prints
{someotherkey=someothervalue, somekey=somevalue}
If your lines look differently, I either suggest you fetch columns[0] and columns[1] and do your string manipulation as needed, or, if you're comfortable with regular expressions, you could use Pattern / Matcher to match the line against a pattern and get the content from the capture groups.
In the hash map if you want to map each row in the two columns you can make the first column value as the key and the second column as the value. But the keys should be unique in the Hashmap. If the first column values are unique you can go for the following approach
Map<String,String> map = new HashMap<String,String>();
map.put(firstColVal,secondColVal);
Just in case
your keys (first column) don't contain spaces and
your columns are separated by either a :, a = or a white char (except newline)
then this may work:
Map<Object, Object> map = new Properties();
((Properties) map).load(new FileReader("inputfile.txt"));
Just saw your sample input... You shouldn't put that data in a map, unless it is guaranteed that all firstnames are unique.
Otherwise this will happen:
map.put("Homer", "Simpson"); // new key/value pair
map.put("Bart", "Simpson"); // new key/value pair
map.put("Homer", "Johnsson"); // value for "Homer" is replaced with "Johnsson"
System.out.println(map.get("Homer")); // guess what happens..