I am trying to write a Java program that loads the data (from a tab delimited DAT file) and determines the average amount in Euros (EUR), grouped by Country and Credit Rating.
I have 2 questions,
what is the best way to load the data into data structure after spliting into array?
How do i approach about providing group by functionality in Java
Update: I have given a first try and this is how implementation looks like. Feels like there is a room for improvement.
/**
* #param rows - Each row as a bean
* This method will group objects together based on Country/City and Credit Rating
*/
static void groupObjectsTogether(List<CompanyData> rows) {
Map<String, List<CompanyData>> map = new HashMap<String, List<CompanyData>>();
for(CompanyData companyData : rows){
String key;
if(companyData.getCountry().trim().equalsIgnoreCase("") || companyData.getCountry() == null){
key = companyData.getCity()+":"+companyData.getCreditRating(); //use city+creditRating as key
}else{
key = companyData.getCountry()+":"+companyData.getCreditRating(); //use country+creditRating as key
}
if(map.get(key) == null){
map.put(key, new ArrayList<CompanyData>());
}
map.get(key).add(companyData);
}
processGroupedRowsAndPrint(map);
}
It all depends on the amount of data and performance (CPU vs memory) of the machine. It the amount of data is not significant (less than millions of records or columns) and the number of columns is fixed then you may simply put all data in arrays using
String[] row = String.split(";");
which shall split each row using ; as delimiter. Then you may achieve your grouping functionality using HashMap, i.e.:
ArrayList<String[]> rowAr = new ArrayList<String[]>();
HashMap<String,ArrayList<Integer>> map = new HashMap<String,ArrayList<Integer>>();
int index = 0;
for (String rowStr: rows) {
String[] row = rowStr.split(";");
rowAr.add(row);
String companyCode = row[0];
//please keep in mind that for simplicity of the example I avoided
//creation of new array if it does not exist in HashMap
((ArrayList<Integer>)map.get(companyCode)).add(index);
index++;
}
Sorry for any syntax or other simple errors above (I do not have any tools in hand to verify if there is not any stupid mistake).
Related
I have a test.csv file that is formatted as:
Home,Owner,Lat,Long
5th Street,John,5.6765,-6.56464564
7th Street,Bob,7.75,-4.4534564
9th Street,Kyle,4.64,-9.566467364
10th Street,Jim,14.234,-2.5667564
I have a hashmap that reads a file that contains the same header contents such as the CSV, just a different format, with no accompanying data.
In example:
Map<Integer, String> container = new HashMap<>();
where,
Key, Value
[0][NULL]
[1][Owner]
[2][Lat]
[3][NULL]
I have also created a second hash map that:
BufferedReader reader = new BufferedReader (new FileReader("test.csv"));
CSVParser parser = new CSVParser(reader, CSVFormat.DEFAULT);
Boolean headerParsed = false;
CSVRecord headerRecord = null;
int i;
Map<String,String> value = new HashMap<>();
for (final CSVRecord record : parser) {
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
for (i =0; i< record.size(); i++) {
value.put (headerRecord.get(0), record.get(0));
}
}
I want to read and compare the hashmap, if the container map has a value that is in the value map, then I put that value in to a corresponding object.
example object
public DataSet (//args) {
this.home
this.owner
this.lat
this.longitude
}
I want to create a function where the data is set inside the object when the hashmaps are compared and when a value map key is equal to a contain map key, and the value is placed is set into the object. Something really simply that is efficient at handling the setting as well.
Please note: I made the CSV header and the rows finite, in real life, the CSV could have x number of fields(Home,Owner,Lat,Long,houseType,houseColor, ect..), and a n number of values associated to those fields
First off, your approach to this problem is too unnecessarily long. From what I see, all you are trying to do is this:
Select a two columns from a CSV file, and add them to a data structure. I highlighted the word two because in a map, you have a key and a value. One column becomes the key, and the other becomes the value.
What you should do instead:
Import the names of columns you wish to add to the data structure into two strings. (You may read them from a file).
Iterate over the CSV file using the CSVParser class that you did.
Store the value corresponding to the first desired column in a string, repeat with the value corresponding to the second desired column, and push them both into a DataSet object, and push the DataSet object into a List<DataSet>.
If you prefer to stick to your way of solving the problem:
Basically, the empty file is supposed to hold just the headers (column names), and that's why you named the corresponding hash map containers. The second file is supposed to contain the values and hence you named the corresponding hash map values.
First off, where you say
if (!headerParsed = false) {
headerRecord = record;
headerParsed = true;
}
you probably mean to say
if (!headerParsed) {
headerRecord = record;
headerParsed = true;
}
and where you say
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(0), record.get(0));
}
you probably mean
for (i =0; i< record.size(); i++) {
value.put(headerRecord.get(i), record.get(i));
}
i.e. You iterate over one record and store the value corresponding to each column.
Now I haven't tried this code on my desktop, but since the for loop also iterates over Home and Longitude, I think it should create an error and you should add an extra check before calling value.put (i.e. value.put("Home", "5th Street") should create an error I suppose). Wrap it inside an if conditional and check of the headerRecord(i) even exists in the containers hash map.
for (i =0; i< record.size(); i++) {
if (container[headerRecord.get(i)] != NULL) {
value.put(headerRecord.get(i), record.get(i));
}
}
Now thing is, that the data structure itself depends on which values from the containers hash map you want to store. It could be Home and Lat, or Owner and Long. So we are stuck. How about you create a data structure like below:
struct DataSet {
string val1;
string val2;
}
Also, note that this DataSet is only for storing ONE row. For storing information from multiple rows, you need to create a Linked List of DataSet.
Lastly, the container file contains ALL the column names. Not all these columns will be stored in the Data Set (i.e. You chose to NULL Home and Long. You could have chosen to NULL Owner and Lat), hence the header file is not what you need to make this decision.
If you think about it, just iterate over the values hash map and store the first value in string val1 and the second value in val2.
List<DataSet> myList;
DataSet row;
Iterator it = values.entrySet().iterator();
while (it.hasNext()) {
Map.Entry pair = (Map.Entry)it.next();
row.val1 = pair.getKey();
row.val2 = pair.getValue();
myList.add(row);
it.remove();
}
I hope this helps.
Using the methods in Get class in the org.apache.hadoop.hbase.client package, I am able to retrieve only one record at a time based on the row key.
Is there any way that multiple rows can be retrieved based on the value of any other non primary column(In java) .
public ArrayList<HashMap<String, String>> getDataQualifierListByRowkeyList(String tableName,
ArrayList<String> rowkeyList) throws IOException {
Table table = connection.getTable(TableName.valueOf(tableName));
List<Get> getList = new ArrayList<Get>();
for (String rowkey : rowkeyList) {
Get get = new Get(Bytes.toBytes(rowkey));
getList.add(get);
}
Result[] results = table.get(getList);
var funcResult = new ArrayList<HashMap<String, String>>();
for (Result result : results) {
var tempHashMap = new HashMap<String, String>();
for (Cell cell : result.rawCells()) {
tempHashMap.put(
String.format("%s:%s", Bytes.toString(CellUtil.cloneFamily(cell)),
Bytes.toString(CellUtil.cloneQualifier(cell))),
Bytes.toString(CellUtil.cloneValue(cell)));
}
funcResult.add(tempHashMap);
}
return funcResult;
}
Get is only for one record which matches row key. For others, you can use scan with column value filters. This approach will have more latency time when compared to get. Get can also be used with list of gets(list of row keys). But you need get with field other than row key.
If possible, best approach would be, to design row key and add frequently searched field along with main row key. This is only if you need better latency.
I have an array that was created from an ArrayList which was in turn created from a ResultSet. This array contains rows of database table and each row (with several columns based on my query) exists as a single element in the array. So far so good. My problem is how to get individual values (columns) from each row which, I said earlier, now exists as an element. I can get each element (row, of course) but that is not what I want. Each element is a composite of several values and how to get those? I am a beginner and really stuck here. I think this all make sense. Here's the code how I created the array.
List resultsetRowValues = new ArrayList();
while (resultSet.next()){
for (int i = 1; i <= columnCount; i++) {
resultsetRowValues.add(resultSet.getString(i));
}
}
String[] databaseRows = (String[]) resultsetRowValues.toArray(new String[resultsetRowValues.size()]);
EDIT: More explanation
My MySQL query is as follows:
String query = "SELECT FIRSTNAME, LASTNAME, ADDRESS FROM SOMETABLE WHERE CITY='SOMECITY'";
This returns several rows in a ResultSet. And according to the sample query each element of an array will cotain three values (columns) i.e FIRSTNAME, LASTNAME and ADDRESS. But these three values exist in the array as a single element. While I want each column separately from each element (which is actually a row of the database table). When I iterate through the aarray using for loop and print the values to the console, I get output similar to the following:
Doe
Jhon
Some Street (End of First element)
Smith
Jhon
Some Apartment (End of Second element and so on)
As it is evident from the output, each element of the contains three values which are printed on separate lines.
How to get these individual values.
You probably want something like that:
List<Map<String, String>> data = new ArrayList<>();
while (resultSet.next()){
Map<String, String> map = new HashMap<>();
for (int i = 1; i <= columnCount; i++) {
map.put("column" + i, resultSet.getString(i));
}
data.add(map)
}
// usage: data.get(2).get("column12") returns line 3 / column 12
Note that there are other possible options (2D-array, guava Table, ...)
I am getting data for a particular user id from 14 tables as shown below. As part of data, I am extracting user_id, record_name and record_value and then I get timestamp from record_name (by splitting on it) and then populate my TreeMap with key as timestamp and value as record_value.
After that I am extracting 100 most recent record_value from valueTimestampMap and then populating it in my recordValueHolder LinkedList.
In my case 100 most recent means by looking at the timestamp not the way they are coming.
Below is my code -
public List<String> getData(String userId) {
List<String> recordValueHolder = new LinkedList<String>();
Map<Long, String> valueTimestampMap = new TreeMap<Long, String>(Collections.reverseOrder());
for (int tableNumber = 0; tableNumber < 14; tableNumber++) {
String sql = "select * from table_" + tableNumber + " where user_id='" + userId + "';";
SimpleStatement query = new SimpleStatement(sql);
query.setConsistencyLevel(ConsistencyLevel.QUORUM);
ResultSet res = session.execute(query);
Iterator<Row> rows = res.iterator();
while (rows.hasNext()) {
Row r = rows.next();
String user_id = r.getString("user_id"); // get user id
String record_name = r.getString("record_name"); // get record name
String record_value = r.getString("record_value"); // get record value
long timestamp = Long.parseLong(record_name.split("\\.")[1]);
// populate my tree map
valueTimestampMap.put(timestamp, record_value);
}
}
// now extract 100 most recent record_value since
// valueTimestampMap is already sorted basis on key in
// descending order
for (Map.Entry<Long, String> entry : valueTimestampMap.entrySet()) {
if (recordValueHolder.size() > 99)
break;
recordValueHolder.add(entry.getValue());
}
return recordValueHolder;
}
I am sorting TreeMap in descending order of the keys by using Collections.reverseOrder() so that I have most recent timestamps at the top and then I can simply extract 100 most recent record_value from it and that's what my above code does.
Problem Statement:-
I have 100 most recent record_value in recordValueHolder List. Now I also need to find out which tableNumber each record_value out of 100 came from and what was the record_name for that record_value as well?
So I was thinking to make a data structure something like below which can hold 100 most recent record_value along with their tableNumber, record_name and timestamp as well.
public class RecordValueTimestampTableHolder {
private long timestamp;
private String recordName;
private String recordValue;
private Integer tableNumber;
// setters and getters
}
So the size of List<RecordValueTimestampTableHolder> should be 100. Is this possible to do with my current setup? I am not able to understand how to make this work?
Now my return data type of getData method will change and instead of returning List<String>, now it will return List<RecordValueTimestampTableHolder> which will have 100 most recent record_values along with other values as well.
Instead of using a TreeMap<Long, String>, use TreeMap<Long, RecordValueTimestampTableHolder>
Instead of using
valueTimestampMap.put(timestamp, record_value);
use:
valueTimestampMap.put(timestamp, new RecordValueTimestampTableHolder(timestamp, record_name, record_value, tableNumber));
Of course, this means you will have to add a constructor to RecordValueTimestampTableHolder that accepts the four parameters and assigns them to the internal fields.
As you said recordValueHolder will have to be defined as a List<RecordValueTimestampTableHolder> and this will also have to be the return type from this method.
Filling it will be exactly like you fill it now. Though personally I'd use valueTimestampMap.values() to iterate.
int i = 0;
for (RecordValueTimestampTableHolder item : valueTimestampMap.values()) {
recordValueHolder.add(item);
if (++i == 100)
break;
}
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Say I have a couple of columns say First Name, LastName, Email, Phone.
I want to query for a row based on a dynamic column selection.
Say the application will ask for a record based on 1) lastname and phone or 2) FirstName 3) Phone and Email
Instead of creating a table to do a SQL query to find a row based on the column data is there a data structure which suits my needs? I am coding in Java, so if there is an inbuilt API please suggest one
FirstName | LastName | Email | Phone
abc | xyz | abc#m.com | 123
pqr | qwe | pqr#m.com | 342
ijk | uio | ijk#m.com | 987
I'd point you to any of the available in memory SQL Db libraries:
H2
Derby
HSQL
Or maybe you want an indexable, queryable in-memory store:
Hazelcast
Ehcache
Any one of these allows you to write a query against the data stored.
If you want to have the information loaded into memory and available for multiple queries, I would use a lookup structure using a Map (e.g. a HashMap) and ArrayList.
Note: If your only going to query once, I would do it directly in the look when reading the lines.
EG: HashMap<String, ArrayList<wordLocation>> lookup= new HashMap<String, ArrayList<wordLocation>>();
Example:
import java.util.ArrayList;
import java.util.HashMap;
public class WordLookup {
public static void main(String args[]) {
WordLookup wl = new WordLookup();
String[] simulatedFileRows = new String[5];
simulatedFileRows[0] = "cat,dog";
simulatedFileRows[1] = "hen,dog";
simulatedFileRows[2] = "cat,mouse";
simulatedFileRows[3] = "moose,squirrel";
simulatedFileRows[4] = "chicken,rabbit";
String columns[];
String row;
int column = 0;
for(int i=0; i<simulatedFileRows.length; i++) //Simulated readline
{
row = simulatedFileRows[i];
columns = row.split(",");
column=0;
for(String col:columns)
{
column++;
wl.addWord(col, i, column);
}
}
//Where is moose?
ArrayList<wordLocation> locs = wl.getWord("moose");
if(locs!=null)
{
System.out.println("Moose found at:");
for(wordLocation loc: locs)
System.out.println("\t line:"+ loc.line + " column" + loc.column);
}
}
private HashMap<String, ArrayList<wordLocation>> lookup= new HashMap<String, ArrayList<wordLocation>>();
public void addWord(String word, int line, int column)
{
ArrayList<wordLocation> wordLocArr = lookup.get(word);
if(wordLocArr == null)
{
wordLocArr = new ArrayList<wordLocation>();
lookup.put(word,wordLocArr);
}
wordLocArr.add( new wordLocation(line, column));
}
public ArrayList<wordLocation> getWord(String word)
{
return lookup.get(word);
}
class wordLocation{
public int line, column;
public wordLocation(int l, int c)
{this.line = l; this.column = c;}
}
}
suppose you have something like HashMap map for field=>value
then you can do (if you dont want to query with value, you can take out the where statement)
if(map.size()>0){
String whereStatement = " 1=1 ";
String selectStatement = " ";
for(String field : map.keySet()){
whereStatement+= " AND "+ field+"="+map.get(field);
selectStatement+= field+",";
}
selectStatement.replaceLast(",", "");
String query = "SELECT " + selectStatement + " FROM sometable " + whereStatement;
}
If you don't index the columns in an SQL DB, that's roughly equivalent to simply having an array, where each element corresponds to a row.
If you do index the columns, that's about the same as additionally having something like a TreeMap (of string or integer or some collection of objects, depending on the type of the fields, to array index) for each index (at least based on my somewhat limited knowledge of the underlying structure of DBs - actually I think databases typically use b-trees, but there isn't a b-tree structure in the standard Java API to my knowledge).
Actually a TreeMap to array index isn't sufficient for non-unique indices, you'll have to have a TreeMap to a list of array indices, or a MultiMap (not in the standard API).
If you don't have an index for any given query, you'll have to iterate through all the rows to find the correct one. Similarly, you'll have to iterate through the whole array to find the correct element.
So, if you only want to query single columns (and do so efficiently), and this can be any of the columns, you'll have to have a TreeMap (or similar) for each column, and an array or similar as a base structure.
If, however, we're talking about querying any combination of columns, you're unlikely to get a particularly efficient generic solution, as there would simply be too many combinations to have a structure for all of them, even for a small number of columns.
Note: I say TreeMap as opposed to HashMap, as this is closer to how databases actually work. If the types of queries you're running doesn't require sorted data, you could happily use a HashMap instead.