I've recently developed a "classic" 3-tier web applications using Java EE.
I've used GlassFish as application server, MS SQL Server as DBMS and xhtml pages with primefaces components for the front end.
Now, for educational purposes, I want to substitute the relational db with a pure triplestore database but I'm not sure about the procedure to follow.
I've searched a lot on google and on this site but I didn't find what I was looking for, because every answer I found was more theoretical than practical.
If possible, I need a sort of tutorial or some practical tips.
I've read the documentation about Apache Jena but I'm not able to find a solid starting point.
In particoular:
- In order to use MS SQL Server with GlassFish I've used a JDBC Driver, created a datasource and a connection pool. Does it exist an equivalent procedure to set up a triple store database?
- To handle users authentication, I've used a Realm. What should I do now?
For the moment I've created "by hand" a RDF schema and using Jena Schemagen I've translated it into a Java Class. What should I do now?
After several attempts and other research on the net I finally achieved my goal.
I decided to develop a hybrid solution in which I manage users login and their navigation permits via MS SQL Server and JDBCRealm, while I use Jena TDB to save all the other data.
Starting with an RDF schema, I created a Java class that contains resources and properties to easily create my statements via code. Here's an example:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://www.stackoverflow.com/example#"
xml:base="http://www.stackoverflow.com/example">
<rdfs:Class rdf:ID="User"></rdfs:Class>
<rdfs:Class rdf:ID="Project"></rdfs:Class>
<rdf:Property rdf:ID="email"></rdf:Property>
<rdf:Property rdf:ID="name"></rdf:Property>
<rdf:Property rdf:ID="surname"></rdf:Property>
<rdf:Property rdf:ID="description"></rdf:Property>
<rdf:Property rdf:ID="customer"></rdf:Property>
<rdf:Property rdf:ID="insertProject">
<rdfs:domain rdf:resource="http://www.stackoverflow.com/example#User"/>
<rdfs:range rdf:resource="http://www.stackoverflow.com/example#Project"/>
</rdf:Property>
</rdf:RDF>
And this is the Java class:
public class MY_ONTOLOGY {
private static final OntModel M = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
private static final String NS = "http://www.stackoverflow.com/example#";
private static final String BASE_URI = "http://www.stackoverflow.com/example/";
public static final OntClass USER = M.createClass(NS + "User");
public static final OntClass PROJECT = M.createClass(NS + "Project");
public static final OntProperty EMAIL = M.createOntProperty(NS + "hasEmail");
public static final OntProperty NAME = M.createOntProperty(NS + "hasName");
public static final OntProperty SURNAME = M.createOntProperty(NS + "hasSurname");
public static final OntProperty DESCRIPTION = M.createOntProperty(NS + "hasDescription");
public static final OntProperty CUSTOMER = M.createOntProperty(NS + "hasCustomer");
public static final OntProperty INSERTS_PROJECT = M.createOntProperty(NS + "insertsProject");
public static final String getBaseURI() {
return BASE_URI;
}
}
Then I've created a directory on my PC where I want to store the data, like C:\MyTDBdataset.
To store data inside it, I use the following code:
String directory = "C:\\MyTDBdataset";
Dataset dataset = TDBFactory.createDataset(directory);
dataset.begin(ReadWrite.WRITE);
try {
Model m = dataset.getDefaultModel();
Resource user = m.createResource(MY_ONTOLOGY.getBaseURI() + "Ronnie", MY_ONTOLOGY.USER);
user.addProperty(MY_ONTOLOGY.NAME, "Ronald");
user.addProperty(MY_ONTOLOGY.SURNNAME, "Red");
user.addProperty(MY_ONTOLOGY.EMAIL, "ronnie#myemail.com");
Resource project = m.createResource(MY_ONTOLOGY.getBaseURI() + "MyProject", MY_ONTOLOGY.PROJECT);
project.addProperty(MY_ONTOLOGY.DESCRIPTION, "This project is fantastic");
project.addProperty(MY_ONTOLOGY.CUSTOMER, "Customer & Co");
m.add(user, MY_ONTOLOGY.INSERTS_PROJECT, project);
dataset.commit();
} finally {
dataset.end();
}
If I want to read statements in my TDB, I can use something like this:
dataset.begin(ReadWrite.READ);
try {
Model m = dataset.getDefaultModel();
StmtIterator iter = m.listStatements();
while (iter.hasNext()) {
Statement stmt = iter.nextStatement();
Resource subject = stmt.getSubject();
Property predicate = stmt.getPredicate();
RDFNode object = stmt.getObject();
System.out.println(subject);
System.out.println("\t" + predicate);
System.out.println("\t\t" + object);
System.out.println("");
}
m.write(System.out, "RDF/XML"); //IF YOU WANT TO SEE AT CONSOLE YOUR DATA AS RDF/XML
} finally {
dataset.end();
}
If you want to navigate your model in different ways, look at this tutorial provided by Apache.
If you want to remove specific statements in your model, you can write something like this:
dataset.begin(ReadWrite.WRITE);
try {
Model m = dataset.getDefaultModel();
m.remove(m.createResource("http://http://www.stackoverflow.com/example/Ronnie"), MY_ONTOLOGY.NAME, m.createLiteral("Ronald"));
dataset.commit();
} finally {
dataset.end();
}
That's all! Bye!
Related
I created an ontology model using protege. ,
I used java to populate my ontology( reate user , resources..)
and then I save all modification into a file.
Now I need to integrate an RDF server to save changes
after some research I found that Fuseki is one of the best servers that I can use ..
After some more research I also found that I need to use RDFCOnnexion to communicate with the fuseki server but I am having some difficulties with integrating the server and manipulating all of my Java classes.
To request my ontology , I used RDFconnexion:
example :
public static void main(String[] args) {
RDFConnection conn1 =
RDFConnectionFactory.connect("http://localhost:3030/test/") ;
try( QueryExecution qExec = conn1.query("PREFIX ex: <http://example.org/>
SELECT * { ?s ?p ?o }") ) {
ResultSet rs = qExec.execSelect();
ResultSetFormatter.out(rs, qExec.getQuery());
}
}
but I am running into issues trying to create the Agent (user) ,or resource..
below you will find just a part of my Java code :
private final OntModel onto;
private OntModel inferred;
public test() {
onto = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
OntDocumentManager manager = onto.getDocumentManager();
manager.addAltEntry("http://www-test/1.0.0", "ontologies/test.owl");
}
public String createUri(String prefix, String localName){
String uri = prefix + "#" + localName ;
uri = uri.replaceAll(" ", "_") ;
return uri ;
}
// to create Agent
public Resource createAgent(String uri) throws
AlreadyExistingRdfResourceException {
Resource agent = this.createEntity(uri) ;
if (agent==null) return null ;
Property prop ; Statement s ;
s = ResourceFactory.createStatement(agent, RDF.type,
onto.getIndividual(EngineConstants.CD_Agent)) ;
onto.add(s) ;
this.synchronize();
return agent ;
}
// TO get Agent Activty
public Set<Resource> getAgentActivities(String agentUri){
final String query = "SELECT ?entity WHERE { ?entity CD:hasAgent <"+
agentUri +">}" ;
ResultSet resultSet = this.queryExec(this.getInferred(), query);
return this.getRdfResources(resultSet, "entity") ;
}
I need to know if someone can help me and give me an example how I can use and integrate Fuseki to ( modify and request my ontology).
thank you for your help
Note you probably first want to retrieve your graph using the fetch() method - http://jena.apache.org/documentation/javadoc/rdfconnection/org/apache/jena/rdfconnection/RDFDatasetAccessConnection.html#fetch-java.lang.String- - which will be more efficient than querying for it as you do now e.g.
Model model = connection.fetch("http://your-graph-name");
If you are just using the default graph you can just do connection.fetch() to retrieve that.
Once you have the local copy modify it with Jena APIs as you desire.
You can then use the put() method to update a graph - http://jena.apache.org/documentation/javadoc/rdfconnection/org/apache/jena/rdfconnection/RDFConnection.html#put-java.lang.String-org.apache.jena.rdf.model.Model- - with your local changes e.g.
connection.put("http://your-graph-name", model);
This will overwrite the existing graph with the current contents of model. Again if you are just using the default graph you can just do connection.put(model).
I've been reading the H2O documentation for a while, and I haven't found a clear example of how to load model trained and saved using the Python API. I was following the next example.
import h2o
from h2o.estimators.naive_bayes import H2ONaiveBayesEstimator
model = H2ONaiveBayesEstimator()
h2o_df = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
model.train(y = "IsDepDelayed", x = ["Year", "Origin"],
training_frame = h2o_df,
family = "binomial",
lambda_search = True,
max_active_predictors = 10)
h2o.save_model(model, path=models)
But if you check the official documentation it states that you have to download the model as a POJO from the flow UI. Is it the only way? or, may I achieve the same result via python? Just for information, I show the doc's example below. I need some guidance.
import java.io.*;
import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;
public class main {
private static String modelClassName = "gbm_pojo_test";
public static void main(String[] args) throws Exception {
hex.genmodel.GenModel rawModel;
rawModel = (hex.genmodel.GenModel) Class.forName(modelClassName).newInstance();
EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);
//
// By default, unknown categorical levels throw PredictUnknownCategoricalLevelException.
// Optionally configure the wrapper to treat unknown categorical levels as N/A instead:
//
// EasyPredictModelWrapper model = new EasyPredictModelWrapper(
// new EasyPredictModelWrapper.Config()
// .setModel(rawModel)
// .setConvertUnknownCategoricalLevelsToNa(true));
RowData row = new RowData();
row.put("Year", "1987");
row.put("Month", "10");
row.put("DayofMonth", "14");
row.put("DayOfWeek", "3");
row.put("CRSDepTime", "730");
row.put("UniqueCarrier", "PS");
row.put("Origin", "SAN");
row.put("Dest", "SFO");
BinomialModelPrediction p = model.predictBinomial(row);
System.out.println("Label (aka prediction) is flight departure delayed: " + p.label);
System.out.print("Class probabilities: ");
for (int i = 0; i < p.classProbabilities.length; i++) {
if (i > 0) {
System.out.print(",");
}
System.out.print(p.classProbabilities[i]);
}
System.out.println("");
}
}
h2o.save_model will save the binary model to the provided file system, however, looking at the Java application above it seems you want to use model into a Java based scoring application.
Because of that you should be using h2o.download_pojo API to save the model to local file system along with genmodel jar file. The API is documented as below:
download_pojo(model, path=u'', get_jar=True)
Download the POJO for this model to the directory specified by the path; if the path is "", then dump to screen.
:param model: the model whose scoring POJO should be retrieved.
:param path: an absolute path to the directory where POJO should be saved.
:param get_jar: retrieve the h2o-genmodel.jar also.
Once you have download POJO, you can use the above sample application to perform the scoring and make sure the POJO class name and the "modelClassName" are same along with model type.
Using test cases I was able to see how ELKI can be used directly from Java but now I want to read my data from MongoDB and then use ELKI to cluster geographic (long, lat) data.
I can only cluster data from a CSV file using ELKI. Is it possible to connect de.lmu.ifi.dbs.elki.database.Database with MongoDB? I can see from the java debugger that there is a databaseconnection field in de.lmu.ifi.dbs.elki.database.Database.
I query MongoDB creating POJO for each row and now I want to cluster these objects using ELKI.
It is possible to read data from MongoDB and write it in a CSV file then use ELKI to read that CSV file but I would like to know if there is a simpler solution.
---------FINDINGS_1:
From ELKI - Use List<String> of objects to populate the Database I found that I need to implement de.lmu.ifi.dbs.elki.datasource.DatabaseConnection and specifically override the loadData() method which returns an instance of MultiObjectsBundle.
So I think I should wrap a list of POJO with MultiObjectsBundle. Now i'm looking at the MultiObjectsBundle and it looks like the data should be held in columns. Why columns datatype is List> shouldnt it be List? just a list of items you want to cluster?
I'm a little confused. How is ELKI going to know that it should look at the long and lat for POJO? Where do I tell ELKI to do this? Using de.lmu.ifi.dbs.elki.data.type.SimpleTypeInformation?
---------FINDINGS_2:
I have tried to use ArrayAdapterDatabaseConnection and I have tried implementing DatabaseConnection. Sorry I need thing in very simple terms for me to understand.
This is my code for clustering:
int minPts=3;
double eps=0.08;
double[][] data1 = {{-0.197574246, 51.49960695}, {-0.084605692, 51.52128377}, {-0.120973687, 51.53005939}, {-0.156876, 51.49313},
{-0.144228881, 51.51811784}, {-0.1680743, 51.53430039}, {-0.170134484,51.52834133}, { -0.096440751, 51.5073853},
{-0.092754157, 51.50597426}, {-0.122502346, 51.52395143}, {-0.136039674, 51.51991453}, {-0.123616824, 51.52994371},
{-0.127854211, 51.51772703}, {-0.125979294, 51.52635795}, {-0.109006325, 51.5216612}, {-0.12221963, 51.51477076}, {-0.131161087, 51.52505093} };
// ArrayAdapterDatabaseConnection dbcon = new ArrayAdapterDatabaseConnection(data1);
DatabaseConnection dbcon = new MyDBConnection();
ListParameterization params = new ListParameterization();
params.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.MINPTS_ID, minPts);
params.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.EPSILON_ID, eps);
params.addParameter(DBSCAN.DISTANCE_FUNCTION_ID, EuclideanDistanceFunction.class);
params.addParameter(AbstractDatabase.Parameterizer.DATABASE_CONNECTION_ID, dbcon);
params.addParameter(AbstractDatabase.Parameterizer.INDEX_ID,
RStarTreeFactory.class);
params.addParameter(RStarTreeFactory.Parameterizer.BULK_SPLIT_ID,
SortTileRecursiveBulkSplit.class);
params.addParameter(AbstractPageFileFactory.Parameterizer.PAGE_SIZE_ID, 1000);
Database db = ClassGenericsUtil.parameterizeOrAbort(StaticArrayDatabase.class, params);
db.initialize();
GeneralizedDBSCAN dbscan = ClassGenericsUtil.parameterizeOrAbort(GeneralizedDBSCAN.class, params);
Relation<DoubleVector> rel = db.getRelation(TypeUtil.DOUBLE_VECTOR_FIELD);
Relation<ExternalID> relID = db.getRelation(TypeUtil.EXTERNALID);
DBIDRange ids = (DBIDRange) rel.getDBIDs();
Clustering<Model> result = dbscan.run(db);
int i =0;
for(Cluster<Model> clu : result.getAllClusters()) {
System.out.println("#" + i + ": " + clu.getNameAutomatic());
System.out.println("Size: " + clu.size());
System.out.print("Objects: ");
for(DBIDIter it = clu.getIDs().iter(); it.valid(); it.advance()) {
DoubleVector v = rel.get(it);
ExternalID exID = relID.get(it);
System.out.print("DoubleVec: ["+v+"]");
System.out.print("ExID: ["+exID+"]");
final int offset = ids.getOffset(it);
System.out.print(" " + offset);
}
System.out.println();
++i;
}
The ArrayAdapterDatabaseConnection produces two clusters, I just had to play around with the value of epsilon, when I set epsilon=0.008 dbscan started creating clusters. When i set epsilon=0.04 all the items were in 1 cluster.
I have also tried to implement DatabaseConnection:
#Override
public MultipleObjectsBundle loadData() {
MultipleObjectsBundle bundle = new MultipleObjectsBundle();
List<Station> stations = getStations();
List<DoubleVector> vecs = new ArrayList<DoubleVector>();
List<ExternalID> ids = new ArrayList<ExternalID>();
for (Station s : stations){
String strID = Integer.toString(s.getId());
ExternalID i = new ExternalID(strID);
ids.add(i);
double[] st = {s.getLongitude(), s.getLatitude()};
DoubleVector dv = new DoubleVector(st);
vecs.add(dv);
}
SimpleTypeInformation<DoubleVector> type = new VectorFieldTypeInformation<>(DoubleVector.FACTORY, 2, 2, DoubleVector.FACTORY.getDefaultSerializer());
bundle.appendColumn(type, vecs);
bundle.appendColumn(TypeUtil.EXTERNALID, ids);
return bundle;
}
These long/lat are associated with an ID and I need to link them back to this ID to the values. Is the only way to go that using the ID offset (in the code above)? I have tried to add ExternalID column but I don't know how to retrieve the ExternalID for a particular NumberVector?
Also after seeing Using ELKI's Distance Function I tried to use Elki's longLatDistance but it doesn't work and I could not find any examples to implement it.
The interface for data sources is called DatabaseConnection.
JavaDoc of DatabaseConnection
You can implement a MongoDB-based interface to get the data.
It is not complicated interface, it has a single method.
I need to create RDF/XML documents containing objects in the OSLC namespace.
e.g.
<oslc_disc:ServiceProviderCatalog
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:oslc_disc="http://open-services.net/xmlns/discovery/1.0/"
rdf:about="{self}">
<dc:title>{catalog title}</dc:title>
<oslc_disc:details rdf:resource="{catalog details uri}" />
what is the simplest way to create this doc using the Jena API ?
( I know about Lyo, they use a JSP for this doc :-)
Thanks, Carsten
Here's a complete example to start you off. Be aware that this will be equivalent to XML output you want, but may not be identical. The order of properties, for example, may vary, and there are other ways to write the same content.
import com.hp.hpl.jena.rdf.model.*
import com.hp.hpl.jena.vocabulary.DCTerms;
public class Jena {
// Vocab items -- could use schemagen to generate a class for this
final static String OSLC_DISC_NS = "http://open-services.net/xmlns/discovery/1.0/";
final static Resource ServiceProviderCatalog =
ResourceFactory.createResource(OSLC_DISC_NS + "ServiceProviderCatalog");
final static Property details =
ResourceFactory.createProperty(OSLC_DISC_NS, "details");
public static void main(String[] args) {
// Inputs
String selfURI = "http://example.com/self";
String catalogTitle = "Catalog title";
String catalogDetailsURI = "http://example.com/catalogDetailsURI";
// Create in memory model
Model model = ModelFactory.createDefaultModel();
// Set prefixes
model.setNsPrefix("dc", DCTerms.NS);
model.setNsPrefix("oslc_disc", OSLC_DISC_NS);
// Add item of type spcatalog
Resource self = model.createResource(selfURI, ServiceProviderCatalog);
// Add the title
self.addProperty(DCTerms.title, catalogTitle);
// Add details, which points to a resource
self.addProperty(details, model.createResource(catalogDetailsURI));
// Write pretty RDF/XML
model.write(System.out, "RDF/XML-ABBREV");
}
}
First of all, I'm not sure about if the title or the tags are correct. If not, someone please correct me
My question is if there are any tools or ways to create an autocomplete list with items from an external source, having netbeans parsing it and warn me if there are any errors.
-- The problem: I use JDBC and I want to model somehow all my schemas, tables and columns so that netbeans can parse it and warn me if I have anything wrong. For example with a normal use of JDBC I would had a function:
ResultSet execSelect( String cols, String table ){
return statement.executeQuery("SELECT "+cols+" FROM "+table); }
The problem is that someone should know exactly what are the available params for that to pass the correct strings.
I would like netbeans to show me somehow an autocomplete list with all available options.
PS. I had exactly the same problem when I was building a web application and I wanted somehow to get all paths for my external resources like images, .js files, .css files etc.
-- Thoughts so far:
My thoughts till now were to put a .java file with public static final String vars with some how nested static classes so that I could access from anywhere. For example:
DatabaseModel.MySchema.TableName1.ColumnName2
would be a String varialble with the 'ColumnName2' column and 'TableName1' table. That would help me with autocompletes but the problem is that there is no type checking. In other words someone could use any string, global defined or not as a table and as a column which is not correct either. I'm thinking to use nested enums somehow to cover these cases about type checking but I'm not sure if that would be a good solution in any case.
Any thoughts?
Finally I came up with writting a "script" that connects to mysql gets all meta data (every column of every table of every schema) and creates a java file with predefined classes and Strings that describes the model. For example:
- If you want the name of the column C1 from table T1 from schema S1 you would type DatabaseModel.S1.T1.C1._ which is a public static final String with the column name.
- If you want the table T2 from schema S2 you would type DatabaseModel.S2.T2 which is a class that implements DatabaseTable interface. So the function: execSelect could take a DatabaseTable and a DatabaseColumn as a parameter.
Here is the code (not tested but the idea is clear I think).
public static void generateMysqlModel(String outputFile) throws IOException, SQLException{
//** Gather the database model
// Maps a schema -> table -> column
HashMap<String,HashMap<String,ArrayList<String>>> model =
new HashMap<String,HashMap<String,ArrayList<String>>>();
openDatabase();
Connection sqlConn = DriverManager.getConnection(url, username, password);
DatabaseMetaData md = sqlConn.getMetaData();
ResultSet schemas = md.getSchemas(); // Get schemas
while( schemas.next() ){ // For every schema
String schemaName = schemas.getString(1);
model.put( schemaName, new HashMap<String,ArrayList<String>>() );
ResultSet tables = md.getTables(null, null, "%", null); // Get tables
while (tables.next()) { // For every table
String tableName = tables.getString(3);
model.get(schemaName).put( tableName, new ArrayList<String>() );
// Get columns for table
Statement s = sqlConn.createStatement(); // Get columns
s.execute("show columns in "+tables.getString(3)+";");
ResultSet columns = s.getResultSet();
while( columns.next() ){ // For every column
String columnName = columns.getString(1);
model.get(schemaName).get(tableName).add( columnName );
}
}
}
closeDatabase();
//** Create the java file from the collected model
new File(outputFile).createNewFile();
BufferedWriter bw = new BufferedWriter( new FileWriter(outputFile) ) ;
bw.append( "public class DatabaseModel{\n" );
bw.append( "\tpublic interface DatabaseSchema{};\n" );
bw.append( "\tpublic interface DatabaseTable{};\n" );
bw.append( "\tpublic interface DatabaseColumn{};\n\n" );
for( String schema : model.keySet() ){
HashMap<String,ArrayList<String>> schemaTables = model.get(schema);
bw.append( "\tpublic static final class "+schema+" implements DatabaseSchema{\n" );
//bw.append( "\t\tpublic static final String _ = \""+schema+"\";\n" );
for( String table : schemaTables.keySet() ){
System.out.println(table);
ArrayList<String> tableColumns = schemaTables.get(table);
bw.append( "\t\tpublic static final class "+table+" implements DatabaseTable{\n" );
//bw.append( "\t\t\tpublic static final String _ = \""+table+"\";\n" );
for( String column : tableColumns ){
System.out.println("\t"+column);
bw.append( "\t\t\tpublic static final class "+column+" implements DatabaseColumn{"
+ " public static final String _ = \""+column+"\";\n"
+ "}\n" );
}
bw.append( "\t\t\tpublic static String val(){ return this.toString(); }" );
bw.append( "\t\t}\n" );
}
bw.append( "\t\tpublic static String val(){ return this.toString(); }" );
bw.append( "\t}\n" );
}
bw.append( "}\n" );
bw.close();
}
PS. For the resources case in a web application I guess someone could get all files recursively from the "resources" folder and fill in the model variable. That will create a java file with the file paths. The interfaces in that case could be the file types or any other "file view" you want.
I also thought that it would be useful to create the .java file from an XML file for any case, so anyone would just create some kind of defintion in an xml file for that purpose.
If someone implements anything like that can post it here.
Any comments/improvements will be welcomed.