Hbase loading .opv and .ope give hexadecimal output - java

I'm using Oracle Big Data Spatial & Graph v.2.5 and following the official guide to load through Java a Graph on HBase.
This is my code:
public class Main {
public static void main(String[] arg) throws Exception {
org.apache.log4j.BasicConfigurator.configure();
OraclePropertyGraphDataLoader opgdl = OraclePropertyGraphDataLoader.getInstance();
String vfile = "/root/oracle_property_files/connections.opv";
String efile = "/root/oracle_property_files/connections.ope";
PgHbaseGraphConfig cfg = GraphConfigBuilder.forPropertyGraphHbase()
.setName("config").setZkQuorum("zk01node,zk02node,zk03node").build();
OraclePropertyGraph opg = OraclePropertyGraph.getInstance(cfg);
opgdl.loadData(opg, vfile, efile, 48);
}
}
Using this libraries:
This is my .opv file:
1,name,1,Alice,,
1,age,2,,31,
2,name,1,Bob,,
2,age,2,,27,
And this is my .ope file:
1,1,2,knows,type,1,friends,,
My code creates on HBase the tables: configEI.
configGE.
configIT.
configVI.
configVT.
The problem is that if I launch the command scan 'configVT.' The output is mixed in hexadecimal and ASCII values:
hbase(main):003:0> scan 'configVT.'
ROW COLUMN+CELL
3v\x93ur|\xD7\xD3\x00\x00\x00\x00\x00\x00\x00\x02 column=v:i\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01, timestamp=1624009988902, value=knows
3v\x93ur|\xD7\xD3\x00\x00\x00\x00\x00\x00\x00\x02 column=v:kage, timestamp=1624009989001, value=\x00\x00\x00\x1B\x02
3v\x93ur|\xD7\xD3\x00\x00\x00\x00\x00\x00\x00\x02 column=v:kname, timestamp=1624009989001, value=Bob\x01
\xCB\xFC%\xA7qt\x02\x84\x00\x00\x00\x00\x00\x00\x00 column=v:kage, timestamp=1624009988909, value=\x00\x00\x00\x1F\x02
\x01
\xCB\xFC%\xA7qt\x02\x84\x00\x00\x00\x00\x00\x00\x00 column=v:kname, timestamp=1624009988909, value=Alice\x01
\x01
\xCB\xFC%\xA7qt\x02\x84\x00\x00\x00\x00\x00\x00\x00 column=v:o\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01, timestamp=1624009988909, value=knows
\x01
2 row(s) in 0.0490 seconds
I would like to have a more readable result.
Edit: It seems that String and Date types are stored correctly (but with some HEX escape character as Alice\x01). Instead the integers are totally converted to theirs HEX values.

I figured it out. Using the scan command, i read the tables as simply hbase tables, but they aren't, they are Oracle Big Data Spatial & Graph tables stored in hbase. So my configVT. table is only one of the five tables created with the java method opgdl.loadData and reading just it is not enough.
In order to get readable result, I should read it has edges or vertex:
opg.getVertices().forEach( e -> {
System.out.println("id vertex: " + e.getId());
e.getPropertyKeys().forEach(p -> {
System.out.println("property: " + p);
System.out.println("value: " + e.getProperty(p));
});
});
opg.getEdges().forEach( e -> {
System.out.println("label: " + e.getLabel());
System.out.println("id edge: " + e.getId());
Vertex vIn = e.getVertex(Direction.IN);
Vertex vOut = e.getVertex(Direction.OUT);
System.out.println("edge from: " + vOut.getId());
System.out.println("edge to: " + vIn.getId());
e.getPropertyKeys().forEach(p -> {
System.out.println("property: " + p);
System.out.println("value: " + e.getProperty(p));
});
});

Related

Read and write to different Mongo collections using Spark with Java

I am a relative newbie to Spark. I need to read from a Mongo collection in Java using Spark, change some field values, let's say I am appending "123" to one field value and write into another collection. Accordingly I had 2 separate Mongo URIs as the input and output URIs configured in Spark. I am then proceeding to read from the input collection. However, what I am not understanding is how would I make the same RDD of documents as output to another collection. This is the input code:
String inputUri = "mongodb://" + kp.getProperty("source.mongo.userid") + ":"
+ Encryptor.decrypt(kp.getProperty("source.mongo.cache")) + "#"
+ kp.getProperty("source.mongo.bootstrap-servers") + "/" + kp.getProperty("source.mongo.database")
+ "." + kp.getProperty("source.mongo.inputCollection") + "?ssl=true&connectTimeoutMS="
+ kp.getProperty("source.mongo.connectTimeoutMS") + "&socketTimeoutMS="
+ kp.getProperty("source.mongo.socketTimeoutMS") + "&maxIdleTimeMS="
+ kp.getProperty("source.mongo.maxIdleTimeMS");
String outputUri = "mongodb://" + kp.getProperty("source.mongo.userid") + ":"
+ Encryptor.decrypt(kp.getProperty("source.mongo.cache")) + "#"
+ kp.getProperty("source.mongo.bootstrap-servers") + "/" + kp.getProperty("source.mongo.database")
+ "." + kp.getProperty("source.mongo.outputCollection") + "?ssl=true&connectTimeoutMS="
+ kp.getProperty("source.mongo.connectTimeoutMS") + "&socketTimeoutMS="
+ kp.getProperty("source.mongo.socketTimeoutMS") + "&maxIdleTimeMS="
+ kp.getProperty("source.mongo.maxIdleTimeMS");
SparkSession spark = SparkSession.builder().master("local[3]").appName(kp.getProperty("spark.app.name"))
.config("spark.mongodb.input.uri", inputUri)
.config("spark.mongodb.output.uri", outputUri)
...;
JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
JavaMongoRDD<Document> rdd = MongoSpark.load(sc);
System.out.println("Count: " + rdd.count());
System.out.println(rdd.first().toJson());
Please help me in this regard.
I have got the answer myself. I went the Dataset route instead of RDDs which made the modification simpler. So, to load the Mongo colection, I use
Dataset<Row> df = MongoSpark.load(sc).toDF();
Then I create a temporary view upon it in orcder to be able to use Spark SQL:
df.createOrReplaceTempView("Customer");
I register an UDF for operating upon each column value:
spark.udf().register("Test", new TestUDF(), DataTypes.StringType);
the UDF definition is as follows:
public class TestUDF implements UDF1<String, String> {
#Override
public String call(String customer) throws Exception {
return customer + "123";
}
}
Then I call the UDF using the same column name as the original so that the values in the original dataset are replaced:
df = df.withColumn("CustomerName", functions.callUDF("Test", functions.col("CustomerName")));
Then I write it back to Mongo in a separate collection:
MongoSpark.write(df).option("collection", "myCollection").save();

Simple CoreNLP : ClassNotFoundException

My simple coreNLP code is working with main method as shown in code below.
package com.books.servlet;
import edu.stanford.nlp.simple.Document;
import edu.stanford.nlp.simple.Sentence;
public class SimpleCoreNLPDemo {
public static void main(String[] args) {
// Create a document. No computation is done yet.
Document doc = new Document("add your text here! It can contain multiple sentences.");
for (Sentence sent : doc.sentences()) {
// Will iterate over two sentences
// We're only asking for words -- no need to load any models yet
System.out.println("The second word of the sentence '" + sent + "' is " + sent.word(1));
// When we ask for the lemma, it will load and run the part of speech tagger
System.out.println("The third lemma of the sentence '" + sent + "' is " + sent.lemma(2));
// When we ask for the parse, it will load and run the parser
System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
}
}
}
Then I used this code in my web application as below. when I execute the code. I get Below error and exception
My web app code
public void me(){
Document doc = new Document("add your text here! It can contain multiple sentences.");
for (Sentence sent : doc.sentences()) {
// Will iterate over two sentences
// We're only asking for words -- no need to load any models yet
System.out.println("The second word of the sentence '" + sent + "' is " + sent.word(1));
// When we ask for the lemma, it will load and run the part of speech tagger
System.out.println("The third lemma of the sentence '" + sent + "' is " + sent.lemma(2));
// When we ask for the parse, it will load and run the parser
System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
} }
I have downloaded all the jar files and added to the build path. it is
working fine with main method
I just resolved the problem.
I copied all my Stanford simple NLP jar file to the directory
/WEB-INF/lib
and now my code is working fine. Below is my simple method and its output for your information.
public String s = "I like java and python";
public static Set<String> nounPhrases = new HashSet<>();
public void me() {
Document doc = new Document(" " + s);
for (Sentence sent : doc.sentences()) {
System.out.println("The parse of the sentence '" + sent + "' is " + sent.parse());
}
}
output
The parse of the sentence 'I like java and python' is (ROOT (S (NP (PRP I)) (VP (VBP like) (NP (NN java) (CC and) (NN python)))))

Android SQLite Database - Not reading decimal part of float from database

This is my first SQLite database with a float. I can't figure out why I am unable to store/retrieve the decimal parts of a float.
The database is defined as:
#Override
public void onCreate(SQLiteDatabase db){
// Create a string that contains the SQL statement to create the Nbmc device table
String SQL_CREATE_NBMC_TEMP_DATA_TABLE = "CREATE TABLE " + NbmcContract.NmbcTempData.TABLE_NAME + " ("
+ NbmcContract.NmbcTempData._ID + " INTEGER PRIMARY KEY AUTOINCREMENT, "
+ NbmcContract.NmbcTempData.COLUMN_TIMESTAMP + " TEXT NOT NULL, "
+ NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT + " REAL) ";
db.execSQL(SQL_CREATE_NBMC_TEMP_DATA_TABLE);
}
I store floating point data in it from a service activity:
private static double lastSensorTempReading;
// ============ TEMP ==================
else if (UUID_SENSOR_FFF2.equals(characteristic.getUuid())) {
rxSensorDataType = FFF2_TEMP_CONST;
descStringBuilder.append("Elapsed Time: " + timeFormat.format(timeDiff) + "\n");
// temp comes in two bytes: newData[MSB], newData[LSB]
// temp = MSB + (0.1 * LSB)
int iTempMsb_i = (int) newData[0] & 0xff ;
int iTempLsb_i = (int) newData[1] & 0xff;
lastSensorTempReading = (float)iTempMsb_i + (0.10 * (float)iTempLsb_i);
Log.v("broadcastUpdate","Temp = " + lastSensorTempReading);
// Add this data to the temp Db
tempValues.put(NbmcContract.NmbcTempData.COLUMN_TIMESTAMP, estimatedTime);
tempValues.put(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT, lastSensorTempReading);
newRowId = db_temp.insert(NbmcContract.NmbcTempData.TABLE_NAME, null, tempValues);
}
And, when I use the Log.v to dump the value I think I am storing it looks correct (and it looks correct when I send it to the Main Activity via an intent).
V/broadcastUpdate: Temp = 33.3
However, when I read it back from the SQLite database in my MainActivity, I'm losing the part of the float/double that follows the decimal point but I'm not getting errors reported in the Logcat.
sb.append(" ------------------- Temperature Data -------------------------\n");
nbmcTempDbHelper = new NbmcTempDataDbHelper( this.getApplicationContext());
SQLiteDatabase tmpDb = nbmcTempDbHelper.getReadableDatabase();
c = tmpDb.rawQuery(" SELECT " + NbmcContract.NmbcTempData._ID + ", "
+ NbmcContract.NmbcTempData.COLUMN_TIMESTAMP + ", "
+ NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT +
" FROM " + NbmcContract.NmbcTempData.TABLE_NAME +
" LIMIT " + MAX_RESULTS_RETRIEVED + " OFFSET " + 0, null);
try {
if (c != null) {
if (c.moveToFirst()) {
do {
String tempRowId = c.getString(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData._ID));
String tempTimeString = c.getString(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_TIMESTAMP));
double tempDataDbl = c.getInt(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT));
Log.v("getEmailText", "Temp reading = " + tempDataDbl);
sb.append(tempRowId);
sb.append(DELIMITER);
sb.append(tempTimeString);
sb.append(DELIMITER);
sb.append(tempDataDbl);
sb.append(NEW_LINE);
} while (c.moveToNext());
}
}
} finally {
c.close();
tmpDb.close();
}
V/getEmailText: Temp reading = 30.0
V/getEmailText: Temp reading = 30.0
V/getEmailText: Temp reading = 30.0
V/getEmailText: Temp reading = 30.0
The problem is in this line
double tempDataDbl = c.getInt(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT));
While you are saving a Double you are retrieving an Integer. Just change the line to
double tempDataDbl = c.getDouble(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT));
Unfortunately AFAIK there is no way to get type mismatch. If you read through Data Types in SQLite it says :
In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases should work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases.
Since any container e.g. INTEGER or REAL in your case can hold any and all kinds of data types and not even the database knows which type it is reading.
yes, dear
issue is at fetch time You having some problem with your code.
double tempDataDbl = c.getInt(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT));
change it to
double tempDataDbl = c.getDouble(c.getColumnIndexOrThrow(NbmcContract.NmbcTempData.COLUMN_DATA_FLOAT));

Retrieve Connected Components Graphstream

I am using GraphStream in a project and my problem is that I want to retrieve the list of connected components but I only get either their count or at the very best their Ids.
I have tried this code but it doesn't return anything :
ConnectedComponents cc = new ConnectedComponents();
cc.init(graph);
System.out.println("List of Connected Components :");
for(ConnectedComponent conn : cc) {
System.out.println("Component " + conn.id + " :");
System.out.println("--------------");
for(Node n : conn.getEachNode()) {
Object[] attr = n.getAttribute("xy");
Double x = (Double) attr[0];
Double y = (Double) attr[1];
System.out.println(x + " , " + y);
}
}
The nodes have an attribute "xy" which contains the coordinates stored as Double[].
What did I do wrong? And how can I fix it?
ConnectedComponents has been rewritten in commit on 2015-12-15. There was a problem with retrieving content of components.
If you are not using the git version of GraphStream, maybe you should give it a try.

After extracting data from html using a for loop, how do I insert one by one into a database?

I have extracted multiple data from an HTML using Jsoup and now I am trying to insert one by one into a derby db using JDBC on netbeans.
Here is my code:
public String nameOf() {
String nameStr = null;
String nameResults = "";
for(int j=100;j<=110;j++) {
refNum = j;
//System.out.println("Reference Number: " + refNum);
try {
//crawl and parse HTML from definition and causes page
Document docDandC = Jsoup.connect("http://www.abcd.edu/encylopedia/article/000" + refNum + ".htm").get();
// scrape name data
Elements name = docDandC.select("title");
nameStr = name.get(0).text();
//System.out.println(nameStr);
nameResults += nameStr + " ";
} catch (Exception e) {
//System.out.println("Reference number " + refNum + " does not exist.");
}
}
return nameResults;
So this method takes the names of diseases from 10 different HTMLs. What I am trying to do is to insert one name at a time to a derby db that I have created using JDBC. I have everything set up and all I have left to do is to insert each name in the corresponding name field of a table named DISEASE (which has fields: id, name, etc).
nameResults += nameStr + " ";
This part worries me as well since some diseases can have multiple words. Maybe I should use a list of some sort?
Please help! Thanks in advance.
Something like:
public List<String> nameOf() {
...
List<String> nameResults = new ArrayList<String>();
...
nameResults.add(nameStr);
...
return nameResults;

Categories