Can we Test Object's presence in Oracle Coherence?

Can we Test Object's presence in Oracle Coherence? - java

I have a requirement where i need to create a third party application which will test any object's presence in Oracle Coherence.
Scenario: Our main application uses Oracle Coherence to store some data, now i have to create a separate application (which will be running on a different server - out of the coherence cluster node). This particular application will detect whether some particular object is present in the coherence or not. We have no plans to run coherence on this machine too.
Can any third party application (which is not part of the coherence cluster) connect to coherence and fetch data? If yes then how? can i get some pointers to do the same?

There are multiple ways you can do it.
1) Use Coherence Extend - This allows any application to interact with Coherence without being part of Coherence Cluster.
Refer to http://docs.oracle.com/cd/E14526_01/coh.350/e14509/configextend.htm
This option is supported only if the third part application is in Java, .Net or C++
http://coherence.oracle.com/display/COH35UG/Coherence+Extend#CoherenceExtend-Typesofclients
2) Use REST API - The newer/latest versions of Coherence exposes cache data management using REST API's. Refer to http://docs.oracle.com/cd/E24290_01/coh.371/e22839/rest_intro.htm
This option does not have any restriction on client/third part technology as it is based on XML/JSON over HTTP.
Using REST you can check presence of cache key as below.
GET Operation
GET http://{host}:{port}/cacheName/key
Returns a single object from the cache based on a key. A 404 (Not Found) message is returned if the object with the specified key does not exist.

I created such a tool some time back using the C++ API.
https://github.com/actsasflinn/coherence-tool
I also wrapped the C++ API in a Ruby binding for scripting purposes.
https://github.com/actsasflinn/ruby-coherence
Either of these can run standalone outside of the cluster and rely on the TCP proxy method of communicating with a cluster.

Related

EclipseLink JPA with App Engine: cache synchronization

I have an App Engine project which persists the data inside a Cloud SQL instance, using EclipseLink as JPA persistence manager.
Due to the nature of App Engine (multi-instance environment) we have some concerns on how to synchronize the JPA cache between the instances.
Each JPA instance runs inside a single App Engine instances, so the Memcache service of App Engine is not used (else, EclipseLink does not "know" of what App Engine memcache is or how to use it)
Here is a simple scenario example:
- Instance A read object 1: value="A"
- Instance B read object 1: value="A"
- Instance A write object 1: value="B"
- JPA cache of Instance A is evicted due to write operation
- Instance A read object 1: value="B" (the value is retrieved from the database because cache has been evicted after write operation)
- Instance B read object 1: value="A" (no write operation has been performed, the cache is still valid so the value has not been updated)
Searching around for this kind of Behaviour I found different articles which talk about this [1] [2] [3] [4].
I quote:
unless the database is modified directly by other applications, or by
the same application on other servers in a clustered environment
As the nature of App Engine, for this we can consider it as "other servers in a clustered environment", so the case seems to be the one.
Of course the proper way on how to handle this problem should be build a cache layer for JPA which is built on top of App Engine memcache service, but from my searches I understand that EclipseLink does not allow developing a custom cache layer.
I'm available to build something which can bridge between EclipseLink and App Engine memcache, but I cannot find any reference if there are the proper "hooks" on how to do it.
From the documentation there are few suggestions on how to handle this:
disable the shared cache: This is not a suitable option due to lose of application performance
using a distributed cache (such as Oracle TopLink Grid with Oracle Coherence):
I would like to use the App Engine memcache service, but as I understand there is no EclipseLink "hook" that we can use
using cache coordination (synchronizing the caches, as discussed in this example)
The provided methos seems to be not usable with App Engine environment
Is there a known solution on how properly handle this cache scenario?
The scenario here is very clear, when a write operation is made inside an instance, all the existing JPA cache need to be "notified" as well to evict their own cache.
[1] https://wiki.eclipse.org/EclipseLink/Examples/JPA/CacheCoordination
[2] http://www.eclipse.org/eclipselink/documentation/2.5/concepts/cache011.htm
[3] https://wiki.eclipse.org/EclipseLink/UserGuide/JPA/Basic_JPA_Development/Caching/Coordination
[4] https://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching#Caching_in_Clustered_Environments

I took a look at the EclipseLink (EL) source to see if there is any easy way to extend the cache coordination mechanism to work with GAE.
EL support JMS, and RMI by default, and the cache coordination is built around remoting, so EL can send commands (org.eclipse.persistence.sessions.coordination.Command), which are executed against the AbstractSessionon every host in the cluster.
I don't think there is any way you could use MemCache for caching, because the commands, like MergeChangeSetCommand always operate on the AbstractSession.
It is possible to build your own cache coordination protocol, this is done by extending org.eclipse.persistence.sessions.coordination.TransportManager, and setting eclipselink.cache.coordination.protocol=com.example.MyTransportManager, but the DiscoveryManager uses multicast which is typically not available in the cloud. If you could discover all your GAE instances (and send data directly to each node), I think it would be possible to create a http based cache coordination solution. On AWS it is possible to ask the load balancer for the list of nodes, this is how we get around the multicast problem when we need to use Hazelcast, for intra node communication.

Can a second GAE application access the datastore of a primary application?

If I had an application that stored information in its datastore. Is there a way to access that same datastore from a second application?

Yes you can, with the Remote APIs.
For example, you can use Remote API to access a production datastore
from an app running on your local machine. You can also use Remote API
to access the datastore of one App Engine app from a different App
Engine app.
You need to configure the servlet (see documentation for that) and import the appengine-remote-api.jar in your project (You can find it ..\appengine-java-sdk\lib\)
Only remember that Ancestor Queries with Remote APIs are not working (See this)

You didn't mention why you wanted to access the datastore of one application from another, but depending on the nature of your situation, App Engine modules might be a solution. These are structurally similar to separate applications, but they run under the same application "umbrella" and can access a common datastore.

You can not directly access datastore of another application. Your application must actively serve that data in order for another application to be able to access it. The easiest way to achieve this is via Remote API, which needs a piece of code installed in order to serve the data.
If you would like to have two separate code bases (even serving different hostnames/urls), then see the new AppEngine Modules. They give you ability to run totally different code on separate urls and with different runtime settings (instances), while still being on one application sharing all stateful services (datastore, tasks queue, memcache..).

Infinispan Operational modes

I have recently started taking a look into Infinispan as our caching layer. After reading through the operation modes in Infinispan as mentioned below.
Embedded mode: This is when you start Infinispan within the same JVM as your applications.
Client-server mode: This is when you start a remote Infinispan instance and connect to it using a variety of different protocols.
Firstly, I am confuse now which will be best suited to my application from the above two modes.
I have a very simple use case, we have a client side code that will make a call to our REST Service using the main VIP of the service and then it will get load balanced to individual Service Server where we have deployed our service and then it will interact with the Cassandra database to retrieve the data basis on the user id. Below picture will make everything clear.
Suppose for example, if client is looking for some data for userId = 123 then it will call our REST Service using the main VIP and then it will get load balanced to any of our four service server, suppose it gets load balanced to Service1, and then service1 will call Cassandra database to get the record for userId = 123 and then return back to Client.
Now we are planning to cache the data using Infinispan as compaction is killing our performance so that our read performance can get some boost. So I started taking a look into Infinispan and stumble upon two modes as I mentioned below. I am not sure what will be the best way to use Infinispan in our case.
Secondly, As from the Infinispan cache what I will be expecting is suppose if I am going with Embedded Mode, then it should look like something like this.
If yes, then how Infinispan cache will interact with each other? It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well? if yes, then what configuration setup I need to have to make sure this thing is working fine.
Pardon my ignorance if I am missing anything. Any clear information will make things more clear to me to my above two questions.

With regards to your second image, yes, architecture will exactly look like this.
If yes, then how Infinispan cache will interact with each other?
Please, take a look here: https://docs.jboss.org/author/display/ISPN/Getting+Started+Guide#GettingStartedGuide-UsingInfinispanasanembeddeddatagridinJavaSE
Infinispan will manage it using JGroups protocol and sending messages between nodes. The cluster will be formed and nodes will be clustered. After that you can experience expected behaviour of entries replication across particular nodes.
And here we go to your next question:
It might be possible that at some time, we will be looking for data for those userId's that will be on another Service Instance Infinispan cache? Right? So what will happen in that scenario? Will infinispan take care of those things as well?
Infinispan was developed for this scenario so you don't need to worry about it at all. If you have for example 4 nodes and setting distribution mode with numberOfOwners=2, your cached data will live on exactly 2 nodes in every moment. When you issue GET command on NON owner node, entry will be fetched from the owner.
You can also set clustering mode to replication, where all nodes contain all entries. Please, read more about modes here: https://docs.jboss.org/author/display/ISPN/Clustering+modes and choose what is the best for your use case.
Additionally, when you add new node to the cluster there will StateTransfer take place and synchronize/rebalance entries across cluster. NonBlockingStateTransfer is implemented already so your cluster will be still capable of serving responses during that joining phase. See: https://community.jboss.org/wiki/Non-BlockingStateTransferV2
Similarly for removing/crashing nodes in your cluster. There will be automatic rebalancing process so for example some entries (numOwners=2) which after crash live only at one node will be replicated respectively to live on 2 nodes according to numberOfOwners property in distribution mode.
To sum it up, your cluster will be still up to date and this does not matter which node you are asking for particular entry. If it does not contain it, entry will be fetched from the owner.
if yes, then what configuration setup I need to have to make sure this thing is working fine.
Aforementioned getting started guide is full of examples plus you can find some configuration file examples in the Infinispan distribution: ispn/etc/config-samples/*
I would suggest you to take a look at this source too: http://refcardz.dzone.com/refcardz/getting-started-infinispan where you can find even more basic and very quick configuration examples.
This source also provides decision related information for your first question: "Should I use embedded mode or remote client-server mode?" From my point of view, using remote cluster is more enterprise-ready solution (see: http://howtojboss.com/2012/11/07/data-grid-why/). Your caching layer is very easily scalable, high-available and fault tolerant then and is independent of your database layer and application layer because it simply sits between them.
And you could be interested about this feature as well: https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores

I think in newest Infinispan release supports to work in a special, compatibility mode for those users interested in accessing Infinispan in multiple ways .
follow below link to configure your cache environment to support either embedded or remotely.
Interoperability between Embedded and Remote Server Endpoints

Do client need to worry about multiple memcache servers?

Question:-
Does java client need to worry about multiple servers ?
Meaning:-
I have given two servers in memcached client, but when i set or get a key from cache, do i need to provide any server related info to it or memcache itself takes care of it??
My knowledge:-
Memcache itself takes care due to consistent hashing.
but does spymemcached 2.8.0 provides consistent hashing???

Memcached servers are pooling servers. Meaning that you define a pool (a list) of servers and when the Java client attempts a write it writes towards the pool.
It's the client's job to decide which server from the pool will receive and store the value and how it will retrieve the value from that pool.
Basically this allows you to start with one Memcached server (possibly on the same machine) and if push comes to shove you can add a few dozen more servers to the pool without touching the application code.
Since the client is responsible of distributing data across the pool of servers (the client has to choose right memcached server to store/fetch data) there are few distribution algorithms.
One of the simplest is modula. This algorithm distributes keys depending on the amount of memcached servers in pool. If the number of servers in the pool changes, the client won't be able to the find stored data, there will be cache misses. In such case it's better to use consistent hashing.
Most popular java memcached clients spymemached and xmemcached support consistent hashing.
In some use cases instead of directly using the memcached client, caching can be added to a spring application through AOP (interceptors) using simple-spring-memcached or Spring 3.1 Cache Abstraction. Spring Cache currently doesn't support memcached but simple-spring-memcached provides such integration in snapshot build and upcoming 3.0.0 release.

MemCached server will manage to store and retrieve key/value by itself.
While storing using hash generate the key and store it.
While retrieve again hash the given key and find on which server it has been stored and then fetch it, this will take some time.
Instead, there is one approach which can be used for storing and retrieving.
Create one HashMap and store the key with server address as value. Now next time if the same key needs to get then instead of searching, you will get the server address directly from the HashMap and you needs to fetch value from there only. Hence you can save the searching time of MemCahce server.
Hope you understand what i mean.

Using JPL (Java + Prolog) in a Java EE web application

I would like to develop a Java EE web application that requires Prolog, via JPL, for certain search related tasks.
The web application will be deployed in the JBoss application server.
The Prolog engine can be either YAP or SWI (afaik the only Prolog engines compatible with JPL at the moment).
The Prolog queries depend on information stored in a (potentially large) database.
If someone has tried this or something similar, could you please give me feedback about the following questions?:
What is the best way to manage concurrent http sessions that need access to the Prolog engine?. Is it possible -desirable?- to assign to each separate session its own Prolog engine ?. If this solution works, is it possible to implement something similar to a 'Prolog engine pooling' to quickly assign prolog engines to new sessions ? . Or the best solution is to have a single Prolog engine that will manage all the query requests synchronously ? (and slowly).
How could be managed the interaction of Prolog with the database ?. If the data is changing often in the database and Prolog needs this data to solve its queries, what is the best strategy to keep the facts in the Prolog engine synchronized with the data in the database ?. The navy option of starting from scratch at each new session (e.g., reloading all the data from the database as Prolog facts) does not seem to be a good idea if the database grows large.
Any other expected issues/difficulties related to the java-prolog-database interaction during the implementation ?
Thanks in advance!

What is the best way to manage concurrent http sessions that need access to the Prolog engine?.
If I look at the source of JPL, it looks like it uses an engine pool. The query data type implements the enumerator pattern plus a close() operation. I guess an engine is automatically assigned to a query as long as it is active.
So each http request can independently access the Prolog system via new query objects. If you don't want to close your query object during a http request, I guess you can also attach it to a http session. And reuse it an another request.
How could be managed the interaction of Prolog with the database ?
This depends on the usage pattern of the data in the database and the available access paths. It could be that you can quickly access very large databases during a request and refetch the data during each request. For example if the needed matching data set is small and if the database has good indexes, so that the matching data can be quickly accessed.
Otherwise you would need to implement some intelligent caching. I am currently working at a solution where I use a kind of a check-in/check-out pattern. But this is not suitable for a web server, where you have multiple users. I am using this pattern for a standalone solution where there is one user and one checked out data junk in memory. For a web server with varying multiple users the junks could overflow the webserver memory.
So caching only works if you can limit and throttle the junks or if you have a very large webserver memory. Maybe you can find such an invariant for your application. Otherwise the conclusion could be that you cannot go Java EE independent of whether you use Prolog or not.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.