read an xls file using hashmap - java

If I want to read the data from oracle table using hashmap then i can read it like this.........
String sql = "select * from DPY_VW_REP_DELIVERY_DTLS where weighed_date between ? and ?";
Object[] queryParams = new Object[] {dateFrom, dateTo};
List rsList = this.getJdbcTemplate().queryForList(sql, queryParams);
Iterator it = rsList.iterator();
while(it.hasNext())
{
try
{
LinkedHashMap map = (LinkedHashMap) it.next();
String[] strData = new String[14];
strData[0] = map.get("WEIGHED_DATE_AS_CHAR").toString();
strData[1] = map.get("WEIGHED_DAY_SLNO").toString();
strData[2] = map.get("PARTY_NAME").toString();
strData[3] = map.get("PARTY_ADDRESS1").toString();
strData[4] = map.get("PARTY_ADDRESS2").toString();
strData[5] = map.get("VEHICLE_NO").toString();
}
}
but if I want to read an xls file containing same data using hashmap how can i do it....

Not sure what you mean by 'using hashmap'. JdbcTemplate.queryForList returns results mapped to a List of HashMaps.
It maybe easier to read excel files using Apache POI or similar tools. Alternatively, if using a spreadsheet as a database, you can use JDBC-ODBC bridge. Then you can execute your SQL query. Here is an example for this approach.

Related

Mapping VCF file into POJO

I am trying to map .vcf to my DTO (Simple POJO) , here is what i have tried :
Code:
List<VCard> list = Ezvcard.parse(fr).all(); // fr is FileReader
SomeDto someDto = new SomeDto ();
List<SomeDto > someDtoList = new ArrayList<>();
for (Iterator<VCard> iterator = list.iterator(); iterator.hasNext();) {
VCard vCard = iterator.next();
c.setFirstName(vCard.getFormattedName().getValue());
c.setEmailAddress(vCard.getEmails().get(0).getValue());
c.setMobilePhone(vCard.getTelephoneNumbers().get(0).getText());
someDtoList .add(c);
}
return someDtoList ;
Is there a simplified way of handling this? Like in-built methods to take care if DTO has more properties, so we can avoid manual work?
I am using vCard JAR: https://github.com/mangstadt/ez-vcard
Maybe try this out https://github.com/mangstadt/ez-vcard it was originally a development by google and offers a lot of possibilities.
U may try to convert your vcards to json, to have more possibilities for further work.
String json = Ezvcard.writeJson(vcard).go();

Using ELKI with Mongodb

Using test cases I was able to see how ELKI can be used directly from Java but now I want to read my data from MongoDB and then use ELKI to cluster geographic (long, lat) data.
I can only cluster data from a CSV file using ELKI. Is it possible to connect de.lmu.ifi.dbs.elki.database.Database with MongoDB? I can see from the java debugger that there is a databaseconnection field in de.lmu.ifi.dbs.elki.database.Database.
I query MongoDB creating POJO for each row and now I want to cluster these objects using ELKI.
It is possible to read data from MongoDB and write it in a CSV file then use ELKI to read that CSV file but I would like to know if there is a simpler solution.
---------FINDINGS_1:
From ELKI - Use List<String> of objects to populate the Database I found that I need to implement de.lmu.ifi.dbs.elki.datasource.DatabaseConnection and specifically override the loadData() method which returns an instance of MultiObjectsBundle.
So I think I should wrap a list of POJO with MultiObjectsBundle. Now i'm looking at the MultiObjectsBundle and it looks like the data should be held in columns. Why columns datatype is List> shouldnt it be List? just a list of items you want to cluster?
I'm a little confused. How is ELKI going to know that it should look at the long and lat for POJO? Where do I tell ELKI to do this? Using de.lmu.ifi.dbs.elki.data.type.SimpleTypeInformation?
---------FINDINGS_2:
I have tried to use ArrayAdapterDatabaseConnection and I have tried implementing DatabaseConnection. Sorry I need thing in very simple terms for me to understand.
This is my code for clustering:
int minPts=3;
double eps=0.08;
double[][] data1 = {{-0.197574246, 51.49960695}, {-0.084605692, 51.52128377}, {-0.120973687, 51.53005939}, {-0.156876, 51.49313},
{-0.144228881, 51.51811784}, {-0.1680743, 51.53430039}, {-0.170134484,51.52834133}, { -0.096440751, 51.5073853},
{-0.092754157, 51.50597426}, {-0.122502346, 51.52395143}, {-0.136039674, 51.51991453}, {-0.123616824, 51.52994371},
{-0.127854211, 51.51772703}, {-0.125979294, 51.52635795}, {-0.109006325, 51.5216612}, {-0.12221963, 51.51477076}, {-0.131161087, 51.52505093} };
// ArrayAdapterDatabaseConnection dbcon = new ArrayAdapterDatabaseConnection(data1);
DatabaseConnection dbcon = new MyDBConnection();
ListParameterization params = new ListParameterization();
params.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.MINPTS_ID, minPts);
params.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.EPSILON_ID, eps);
params.addParameter(DBSCAN.DISTANCE_FUNCTION_ID, EuclideanDistanceFunction.class);
params.addParameter(AbstractDatabase.Parameterizer.DATABASE_CONNECTION_ID, dbcon);
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
params.addParameter(RStarTreeFactory.Parameterizer.BULK_SPLIT_ID,
SortTileRecursiveBulkSplit.class);
params.addParameter(AbstractPageFileFactory.Parameterizer.PAGE_SIZE_ID, 1000);
Database db = ClassGenericsUtil.parameterizeOrAbort(StaticArrayDatabase.class, params);
db.initialize();
GeneralizedDBSCAN dbscan = ClassGenericsUtil.parameterizeOrAbort(GeneralizedDBSCAN.class, params);
Relation<DoubleVector> rel = db.getRelation(TypeUtil.DOUBLE_VECTOR_FIELD);
Relation<ExternalID> relID = db.getRelation(TypeUtil.EXTERNALID);
DBIDRange ids = (DBIDRange) rel.getDBIDs();
Clustering<Model> result = dbscan.run(db);
int i =0;
for(Cluster<Model> clu : result.getAllClusters()) {
System.out.println("#" + i + ": " + clu.getNameAutomatic());
System.out.println("Size: " + clu.size());
System.out.print("Objects: ");
for(DBIDIter it = clu.getIDs().iter(); it.valid(); it.advance()) {
DoubleVector v = rel.get(it);
ExternalID exID = relID.get(it);
System.out.print("DoubleVec: ["+v+"]");
System.out.print("ExID: ["+exID+"]");
final int offset = ids.getOffset(it);
System.out.print(" " + offset);
}
System.out.println();
++i;
}
The ArrayAdapterDatabaseConnection produces two clusters, I just had to play around with the value of epsilon, when I set epsilon=0.008 dbscan started creating clusters. When i set epsilon=0.04 all the items were in 1 cluster.
I have also tried to implement DatabaseConnection:
#Override
public MultipleObjectsBundle loadData() {
MultipleObjectsBundle bundle = new MultipleObjectsBundle();
List<Station> stations = getStations();
List<DoubleVector> vecs = new ArrayList<DoubleVector>();
List<ExternalID> ids = new ArrayList<ExternalID>();
for (Station s : stations){
String strID = Integer.toString(s.getId());
ExternalID i = new ExternalID(strID);
ids.add(i);
double[] st = {s.getLongitude(), s.getLatitude()};
DoubleVector dv = new DoubleVector(st);
vecs.add(dv);
}
SimpleTypeInformation<DoubleVector> type = new VectorFieldTypeInformation<>(DoubleVector.FACTORY, 2, 2, DoubleVector.FACTORY.getDefaultSerializer());
bundle.appendColumn(type, vecs);
bundle.appendColumn(TypeUtil.EXTERNALID, ids);
return bundle;
}
These long/lat are associated with an ID and I need to link them back to this ID to the values. Is the only way to go that using the ID offset (in the code above)? I have tried to add ExternalID column but I don't know how to retrieve the ExternalID for a particular NumberVector?
Also after seeing Using ELKI's Distance Function I tried to use Elki's longLatDistance but it doesn't work and I could not find any examples to implement it.
The interface for data sources is called DatabaseConnection.
JavaDoc of DatabaseConnection
You can implement a MongoDB-based interface to get the data.
It is not complicated interface, it has a single method.

How can I read Excel file into List<Map<String, String>>

How can I read excel file into List<Map<String, String>>, where first I need to get column names from excel, then to get all values and put it into List<Map<String, String>>. I want to get a some value from this collection like this: String columnValue = map.get("column_name");
You can export the data from excel in CSV format and use Apache's CSV parser library for JAVA.
And then after importing the library in your classpath you can simply use it as
Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : records) {
String lastName = record.get("Last Name");
String firstName = record.get("First Name");
}
After this you can store data in your List in desired way.
For more information look into
https://commons.apache.org/proper/commons-csv/index.html

Updating mongodb with java driver takes forever?

So this is the case: I have a program that takes two large csv-files, finds the diffs and then sends a array list to a method that is supposed to update the mongodb with the lines from the array. The problem is the updates are taking forever. A test case with 5000 updates takes 36 minutes. Is this normal?
the update(List<String> changes)-method something like this:
mongoClient = new MongoClient(ip);
db = mongoClient.getDB("foo");
collection = db.getCollection("bar");
//for each line of change
for (String s : changes) {
//splits the csv-lines on ;
String[] fields = s.split(";");
//identifies wich document in the database to be updated
long id = Long.parseLong(fields[0]);
BasicDBObject sq = new BasicDBObject().append("organizationNumber",id);
//creates a new unit-object, that is converted to JSON and then inserted into the database.
Unit u = new Unit(fields);
Gson gson = new Gson();
String jsonObj = gson.toJson(u);
DBObject objectToUpdate = collection.findOne(sq);
DBObject newObject = (DBObject) JSON.parse(jsonObj);
if(objectToUpdate != null){
objectToUpdate.putAll(newObject);
collection.save(objectToUpdate);
}
That's because you are taking extra steps to update.
You don't need to parse JSONs manually and you don't have to do the query-then-update when you can just do an update with a "where" clause in a single step.
Something like this:
BasicDBObject query= new BasicDBObject().append("organizationNumber",id);
Unit unit = new Unit(fields);
BasicDBObject unitDB= new BasicDBObject().append("someField",unit.getSomeField()).append("otherField",unit.getOtherField());
collection.update(query,unitDB);
Where query specifies the "where" clause and unitDB specifies the fields that need to be updated.

Why I insert double/float column into Cassandra by hector and got incorrect value int database

I have Some question about Hector Insert double/float data into Cassandra
new Double("13.45")------->13.468259733915328
new Float("64.13") ------->119.87449
When I insert data into Cassandra by hector
TestDouble ch = new TestDouble("talend_bj",
"localhost:9160");
String family = "talend_1";
ch.ensureColumnFamily(family);
List values = new ArrayList();
values.add(HFactory.createColumn("id", 2, StringSerializer.get(),
IntegerSerializer.get()));
values.add(HFactory.createColumn("name", "zhang",
StringSerializer.get(), StringSerializer.get()));
values.add(HFactory.createColumn("salary", 13.45,
StringSerializer.get(), DoubleSerializer.get()));
ch.insertSuper("14", values, "user1", family, StringSerializer.get(),
StringSerializer.get());
StringSerializer se = StringSerializer.get();
MultigetSuperSliceQuery<String, String, String, String> q = me.prettyprint.hector.api.factory.HFactory
.createMultigetSuperSliceQuery(ch.getKeyspace(), se, se, se, se);
// q.setSuperColumn("user1").setColumnNames("id","name")
q.setKeys("12", "11","13", "14");
q.setColumnFamily(family);
q.setRange("z", "z", false, 100);
QueryResult<SuperRows<String, String, String, String>> r = q
.setColumnNames("user1", "user").execute();
Iterator iter = r.get().iterator();
while (iter.hasNext()) {
SuperRow superRow = (SuperRow) iter.next();
SuperSlice s = superRow.getSuperSlice();
List<HSuperColumn> superColumns = s.getSuperColumns();
for (HSuperColumn superColumn : superColumns) {
List<HColumn> columns = superColumn.getColumns();
System.out.println(DoubleSerializer.get().fromBytes(((String) superColumn.getSubColumnByName("salary").getValue()).getBytes()));
}
}
You can see 13.45 but I get the column value is 13.468259733915328
You should break the problem in two. After writing, IF you defined part of your schema OR use te ASSUME keyword on the commandline cli, view the data in cassandra to see if it is correct. PlayOrm has this EXACT unit test(which is on PlayOrm on top of astyanax not hector) and it works just fine....Notice the comparison in the test of -200.23...
https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/TestColumnSlice.java
Once down, does your data in cassandra look correct? If so, the issue is on your reading the value in code, otherwise, it is the writes.

Categories