I configured Maven and managed to run example-Plugins like FullTextIndex (https://github.com/neo4j-contrib/neo4j-rdf/blob/master/src/main/java/org/neo4j/rdf/fulltext/FulltextIndex.java).
Still I struggle to create a simple Function by myself. I want to have a java-function that can find a node by ID and return its properties.
I know I can do this in Cypher, but the target is to understand the logic of plugins for Neo4j.
So after importing the plugin i should be able to type in:
INPUT ID
call example.function(217)
OUTPUT e. g.
name = Tree, age = 85, label = Plant, location = Munich
Thanks a lot!
In Neo4j, user-defined procedures are simple .jar files that you will put in the $NEO4J_HOME/plugins directory. Logically, to create a new user-defined procedure you will need to generate this jar file. You can do it configuring a new maven project or using the repository Neo4j Procedure Template.
User-defined procedures are simply Java classes with methods annotated with #Procedure. If the procedure writes in the database then mode = WRITE should be defined (not your case).
Also you will need query the database to get the node by ID and return the properties. To do it you will need inject in your Java class the GraphDatabaseService class using the #Context annotation.
To achieve your goal, I believe that you will need to use the getNodeById() method from GraphDatabaseService and the getProperties() in the returned Node.
What you are looking for is User Defined Functions / Procedures. There is a dedicated section in the neo4j documentation :
https://neo4j.com/developer/procedures-functions/#_extending_cypher
http://neo4j.com/docs/developer-manual/current/extending-neo4j/procedures/#user-defined-procedures
You can also look at APOC which contains hundreds of such examples used in real life.
https://github.com/neo4j-contrib/neo4j-apoc-procedures
Related
Is it possible to use Google guice as dependency injection provider for a Apache spark Java application?
I am able to achieve this if the execution is happening at the driver but no control over when the execution is happening at executors.
Is it even possible to use the injected objects at the executors? Its hard to manage the code with out the dependency injection in the spark applications.
I think the neutrino framework is exactly for your requirement.
Disclaimer: I am the author of the neutrino framework.
This framework provides the capability to use dependency injection (DI) to generate the objects and control their scope at both the driver and executors.
How does it do that
As we know, to adopt the DI framework, we need to first build a dependency graph first, which describes the dependency relationship between various types and can be used to generate instances along with their dependencies. Guice uses Module API to build the graph while the Spring framework uses XML files or annotations.
The neutrino is built based on Guice framework, and of course, builds the dependency graph with the guice module API. It doesn't only keep the graph in the driver, but also has the same graph running on every executor.
In the dependency graph, some nodes may generate objects which may be passed to the executors, and neutrino framework would assign unique ids to these nodes. As every JVM have the same graph, the graph on each JVM have the same node id set.
When an instance to be transferred is requested from the graph at the driver, instead of creating the actual instance, it just returns a stub object which holds the object creation method (including the node id). When the stub object is passed to the executors, the framework will find the corresponding node in the graph in the executor JVM with the id and recreate the same object and its dependencies there.
Here is an example:
Example:
Here is a simple example (just filter a event stream based on redis data):
trait EventFilter[T] {
def filter(t: T): Boolean
}
// The RedisEventFilter class depends on JedisCommands directly,
// and doesn't extend `java.io.Serializable` interface.
class RedisEventFilter #Inject()(jedis: JedisCommands)
extends EventFilter[ClickEvent] {
override def filter(e: ClickEvent): Boolean = {
// filter logic based on redis
}
}
/* create injector */
val injector = ...
val eventFilter = injector.instance[EventFilter[ClickEvent]]
val eventStream: DStream[ClickEvent] = ...
eventStream.filter(e => eventFilter.filter(e))
Here is how to config the bindings:
class FilterModule(redisConfig: RedisConfig) extends SparkModule {
override def configure(): Unit = {
// the magic is here
// The method `withSerializableProxy` will generate a proxy
// extending `EventFilter` and `java.io.Serializable` interfaces with Scala macro.
// The module must extend `SparkModule` or `SparkPrivateModule` to get it
bind[EventFilter[ClickEvent]].withSerializableProxy
.to[RedisEventFilter].in[SingletonScope]
}
}
With neutrino, the RedisEventFilter doesn't even care about serialization problem. Every thing just works like in a single JVM.
For details, please refer to the neutrino readme file.
Limitation
Since this framework uses scala macro to generate the proxy class, the guice modules and the logic of how to wire up these modules needs to be written with scala. Other classes such as EventFilter and its implementations can be java.
Simply I faced a problem when trying to access query DSL with multiple schemas, I added multiple schemas as below
<schemaPattern>ABC,DEF</schemaPattern>
and my table name pattern is
<tableNamePattern>PQR,STU</tableNamePattern>
suppose both schemas have DEF table then when I compile maven project it gives me the below error.
Failed to execute goal com.querydsl:querydsl-maven-plugin:4.2.1:export (default) on project TestProject:
Execution default of goal com.querydsl:querydsl-maven-plugin:4.2.1:export failed: Attempted to write multiple times to D:\test\repos\testProject\target\generated-sources\testPackage\domain\dependency\QDEF.java, please check your configuration
Can anyone tell me a way to resolve this and also can explain how to access generated classes in specific schema(for example I want to declare QDEF qdet = QDEF.qdef , this is normal way, but how can I declare QDEF in STU schema)?
I believe this was resolved here. It looks like <schemaToPackage>true</schemaToPackage> is what you need.
I've implemented some custom functions SPARQL with Jena ARQ following the Property Function docs. Those functions work with a local dataset using a model:
Model model = ModelFactory.createDefaultModel();
model.read(new FileInputStream("data/data.ttl"), null, "TTL");
Query query = QueryFactory.create(queryString) ; // the queryString contains a custom property function defined with Jena
try (QueryExecution qexec = QueryExecutionFactory.create(query, model)) {
...
}
However, I need to apply these property functions for queries on a dataset on my Graphdb Repository, so I tried to connect program to GraphDB using Jena.
I've tried following Using GraphDB with Jena but it seems out of date and cannot be implemented because SailRepository.initialize(); was deprecated and SesameDataset doesn't longer exist to initialize my model.
import org.eclipse.rdf4j.repository.sail.SailRepository;
import org.eclipse.rdf4j.repository.RepositoryConnection;
import com.ontotext.jena.SesameDataset;
...
RepositoryConnection connection = repository.getConnection();
// finally, create the DatasetGraph instance
SesameDataset dataset = new SesameDataset(connection);
I've also tried RDFConnection of Jena but it doesn't work with my custom functions since there's no model backend to apply queries.
Can anyone tell me where can I find the SesameDataset to import or whether there is other way to query a GraphDB's repository with custom functions?
You would probably make your life a lot easier by using RDF4J to communicate with GraphDB, instead of Jena - it is the recommended Java API for this purpose by the GraphDB developers.
You can implement custom SPARQL functions in RDF4J/GraphDB by following this tutorial - in summary it's a matter of implementing the org.eclipse.rdf4j.query.algebra.evaluation.function.Function interface and making sure your implementation is on the classpath of GraphDB and registered as an SPI implementation.
Jeen (the creator of rdf4j) pointed how you can create custom functions.
However, as AndyS (the creator of Jena) points out, you seem to need magic predicates (custom property functions). For that I think that you need to create a plugin and implement pattern interpretation as described here: http://graphdb.ontotext.com/documentation/standard/plug-in-api.html#pattern-interpretation.
Most plugins are open source at https://github.com/Ontotext-AD, so you can see many examples. I know for sure that https://github.com/Ontotext-AD/graphdb-geosparql-plugin uses magic predicates, eg
?x geo:sfWithin ?y is a predicate that checks (or returns) all features where ?x is within ?y
geof:sfWithin(?x,?y) (note the different namespace) is a function that checks the same condition, but must be called with bound params: it cannot find features satisfying the condition.
See this search and then see IRI GEO_SF_WITHIN, search with GEO_SF_WITHIN, and you find com/ontotext/trree/geosparql/GeoSparqlFunction.java and com/useekm/geosparql/algebra/PropertyToFunctionOptimizer.java
I have an application where I am using Apache Spark 1.4.1 - Standalone cluster. Code from this application has evolved and it's quite complicated (more than a few lines of code as we see in most Apache Spark examples), with lots of method calls from one class to another.
I am trying to add code that when encounters a problem with data (while processing it on the cluster nodes) it notifies an external application. For contacting the external application we have connection details setup in a config file. I want to pass somehow the connection details to the cluster nodes but passing them as parameters to each method that runs on nodes (as parameters or broadcast variable) is not ok for my application because it means that each and every method has to pass them and we've got lots of "chained method calls" (method A calls B, B calls C.....Y calls Z) which is different from most Apache Spark example where we see only one or two method calls.
I am trying to workaround this problem - is there a way to pass data to nodes besides method parameters and broadcast variables ? For example I was looking to setup a env property that would point to the config file (using System.setProperty) and to set it on all nodes, so that I can read connection details on the fly and the code would isolated in one block of code only, but I've got no luck so far.
Actually after some hours of investigation I found a way that really suits my needs. There are two spark properties (one for driver, one for executors) that can be used for passing parameters that can be then read using System.getProperty() :
spark.executor.extraJavaOptions
spark.driver.extraJavaOptions
Using them is more simpler than the approach suggested in above post and you could easily make your application to switch configuration from one environment to another (e.g QA/DEV vs PROD) when you've got all environment setup in your project.
They can be set in the SparkConf object when you're initializing the SparkContext.
The post that helped me a lot in figuring the solution is : http://progexc.blogspot.co.uk/2014/12/spark-configuration-mess-solved.html
The properties you provide as part of --properties-file will be loaded at runtime and will be available only as part of driver but not on any of the executors. But you can always make it available to the executors.
Simple hack:
private static String getPropertyString(String key, Boolean mandatory){
String value=sparkConf.get(key,null);
if(mandatory && value == null ){
value = sparkConf.getenv(key);
if(value == null)
shutDown(key); // Or whatever action you would like to take
}
if(value !=null && sparkConf.getenv(key)==null )
sparkConf.setExecutorEnv(key,value);
return value;
}
First time when your driver kicks, it will find all the properties provided from properties file from sparkconf. As soon as it finds, check whether that key already present in environment if not set those values to executors using setExecutorEnv in your program.
Its tough to distinguish whether your program is in driver or in executor so check whether the property exists in sparkconf if not then check it against environment using getenv(key).
I suggest the following solution:
Put the configuration in a database.
Put the database connection details in a JOCL (Java Object configuration Language) file and have this file available on the class path of each executors.
Make a singleton class that reads the DB connection details from the JOCL, connects to the database, extracts the configuration info and exposes it as getter methods.
Import the class into the context where you have your Spark calls and use it to access the configuration from within them.
We are setting up a slightly complicated project using Play Framework 2.0.3.
We need to access several databases (pre-existing) and would like to do it using the frameworks built-in facilities (ie. EBean).
We tried to create all model classes within the "models" package, and then map each class with its FQN to the corresponding EBean property in the application.conf:
ebean.firstDB="models.ClassA,models.ClassB,models.ClassC"
ebean.secondDB="models.ClassD"
ebean.thirdDB="models.ClassE,models.ClassF"
This doesn't seem to work:
PersistenceException: Error with [models.SomeClass] It has not been enhanced but it's superClass [class play.db.ebean.Model] is? (You are not allowed to mix enhancement in a single inheritance hierarchy) marker[play.db.ebean.Model] className[models.SomeClass]
We checked and re-checked and the configuration is OK!
We then tried to use a different Java package for each database model classes and map them accordingly in the application.conf:
ebean.firstDB = "packageA.*"
ebean.secondDB = "packageB.*"
ebean.thirdDB = "packageC.*"
This works fine when reading information from the database, but when you try to save/update objects we get:
PersistenceException: The default EbeanServer has not been defined? This is normally set via the ebean.datasource.default property. Otherwise it should be registered programatically via registerServer()
Any ideas?
Thanks!
Ricardo
You have to specify in your query which database you want to access.
For example, if you want to retrieve all users from your secondDB :
// Get access to your secondDB
EbeanServer secondDB = Ebean.getServer("secondDB");
// Get all users in secondDB
List<User> userList = secondDB.find(User.class).findList();
When using save(), delete(), update() or refresh(), you have to specify the Ebean server, for instance for the save() method:
classA.save("firstDB");
I have encounter the same problem and waste a whole day to investigate into it,finally I have got it.
1.define named eabean server
db.default.driver=com.mysql.jdbc.Driver
db.default.url="jdbc:mysql://localhost:3306/db1"
db.default.user=root
db.default.password=123456
db.aux.driver=com.mysql.jdbc.Driver
db.aux.url="jdbc:mysql://localhost:3306/db2"
db.aux.user=root
db.aux.password=123456
now you have two ebean server [default] and [aux] at run time.
2.app conf file
ebean.default="models.*"
ebean.aux= "secondary.*"
Now entiies under package models.* configured to [default] server and entities under package secondary.* configured to [aux] server. I think this may related to java class enhancement or something. You don't need to separate Entities into different packages, but if entities of different ebean servers are under same package, it may cause weird trouble and exceptions.
When using you model, save/delete/update related method should add server name as parameter
Student s = new Student(); s.save("aux");
When use finder,you should define your finder as
public static Finder find = new Finder("aux",Long.class,Student.class);
Might not be the same case, I ran to this SomeClass not enhanced PersistenceException with Play 2.1.0,
and only what was missing was a public declaration in SomeClass model class that I had forgotten..
In Play 2.1.0 the error message was a little different:
PersistenceException: java.lang.IllegalStateException: Class [class play.db.ebean.Model] is enhanced and [class models.Address] is not - (you can not mix!!)
This solved my issue with saving to my db table and resolving the error:
"javax.persistence.PersistenceException: The default EbeanServer has not been defined ? This is normally set via the ebean.datasource.default property. Otherwise it should be registered programatically via registerServer()"