Are there any in-memory/caching solutions for java that allow for a form of Querying for specific attributes of objects in the Cache?
I realize this is something that a full blown database would be used for, but I want to be able to have the speed/performance of a cache with the Querying ability of a database.
JBoss Cache has search functionality. It's called JBossCacheSearchable. From the site:
This is the integration package
between JBoss Cache and Hibernate
Search.
The goal is to add search capabilities
to JBoss Cache. We achieve this by
using Hibernate Search to index user
objects as they are added to the cache
and modified. The cache is queried by
passing in a valid Apache Lucene query
which is then used to search through
the indexes and retrieve matching
objects from the cache.
Main JBoss Cache page: http://www.jboss.org/jbosscache/
JBossCacheSearch: http://www.jboss.org/community/docs/DOC-10286
Nowadays the answer should be updated to Infinispan, the successor of JBoss Cache and having much improved Search technology.
Terracotta or GBeans or POJOCache
At first, HSQLDB came to mind, but that's an in-memory relational database rather than an object database. Might want to look at this list. There's a few object databases there, one of which might meet your needs.
Look at db4oat rather lightweight java object database. You can even query the data using regular java code:
List students = database.query( new Predicate(){
public boolean match(Student student){
return student.getAge() < 20
&& student.getGrade().equals(gradeA);}})
(From this article).
Another idea is to use Lucene and a RAMDirectory implementation of Directory to index what you put into your cache. That way, you can query using all the search engine query features which Lucene provides.
In your case, you will probably index the relevant properties of your objects as-is (without using an Analyzer) and query using a boolean equality operator.
Lucene is very lightweight, performant, thread-safe and memory consumption is low.
You might want to check out this library:
http://casperdatasets.googlecode.com
this is a dataset technology. it supports tabular data (either from a database or constructed in code), and you can then construct queries and filters against the dataset (and sort), all in-memory. its fast and easy-to-use. MOST IMPORTANTLY, you can perform queries against ANY column or attribute on the dataset.
Related
If the result set is large, then having the entire result set in memory (server cache e.g. hazelcast) will not be feasible. With large result sets, you cannot afford to have them in memory. In such case, you have to fetch a chunk of data at a time (query based paging). The down side of using query based paging, is that there will be multiple calls to the database for multiple page requests.
Can anyone suggest how to implement a hybrid approach of it.
I haven't put any sample code here since I think the question is more about a logic instead of specific code. Still if you need sample code I can put it.
Thanks in advance.
The most effective solution is to use the primary key as a paging criterion.This enables us to rely of first class constructs like a between range query which is simple for the RDBMS to optimize, the primary key of the queried entity will most likely be indexed already.
Retrieving data using a range query on the primary key is a two-step process. First one have to retrieve the collection of primary-keys, followed by a step to generate the intervals to properly identify a proper subset of the data,followed by the actual queries against the data.
This approach is almost as fast as the brute-force version. The memory consumption is about one tenth. By selecting the appropriate page-size for this implementation, you may alter the ratio between execution time and memory consumption. This version is also stateless, it does not keep references to resources like the ScrollableResults version does, nor does it strain the database like the version using setFirstResult/setMaxResult.
Effective pagination using Hibernate
I have a legacy system that allows users to manage some entities called "TRANSACTION" in the (MySQL) DB, and mapped to Transaction class in Java. Transaction objects have about 30 fields, some of them are columns in the DB, some of them are joins to another tables, like CUSTOMER, PRODUCT, COMPANY and stuff like that.
Users have access to a "Search" screen, where they are allowed to search using a TransactionId and a couple of extra fields, but they want more flexibility. Basically, they want to be able to search using any field in TRANSACTION or any linked table.
I don't know how to make the search both flexible and quick. Is there any way?. I don't think that having an index for every combination of columns is a valid solution, but full table scans are also not valid... is there any reasonable design? I'm using Criteria to build the queries, but this is not the problem.
Also, I think mysql is not using the right indexes, since when I make hibernate log the sql command, I can almost always improve the response time by forcing an index... I'm starting to use something like this trick adapted to Criteria to force a specific index use, but I'm not proud of the "if" chain. I'm getting something like
if(queryDto.getFirstName() != null){
//force index "IDX_TX_BY_FIRSTNAME"
}else if(queryDto.getProduct() != null){
//force index "IDX_TX_BY_PRODUCT"
}
and it feels horrible
Sorry if the question is "too open", I think this is a typical problem, but I can't find a good approach
Hibernate is very good for writing while SQL still excels on reading data. JOOQ might be a better alternative in your case, and since you're using MySQL it's free of charge anyway.
JOOQ is like Criteria on steroids, and you can build more complex queries using the exact syntax you'd use for native querying. You have type-safety and all features your current DB has to offer.
As for indexes, you need can't simply use any field combination. It's better to index the most used ones and try using compound indexes that cover as many use cases as possible. Sometimes the query executor will not use an index because it's faster otherwise, so it's not always a good idea to force the index. What works on your test environment might not stand still for the production system.
I am trying to decide whether I should use App-engine Search API or Datastore for an App-engine Connected Android Project. The only distinction that the google documentation makes is
... an index search can find no more than 10,000 matching documents.
The App Engine Datastore may be more appropriate for applications that
need to retrieve very large result sets.
Given that I am already very familiar with the Datastore: Will someone please help me, assuming I don't need 10,000 results?
Are there any advantages to using the Search API versus using Datastore for my queries (per the quote above, it seems sensible to use one or the other)? In my case the end user must be able to search, update existing entries, and create new entities. For example if my app is a bookstore, the user must be able to add new books, add reviews to existing books, search for a specific book.
My data structure is such that the content will be supplied by the end user. Document vs Datastore entity: which is cheaper to update? $$, etc.
Can they supplement each other: Datastore and Search API? What's the advantage? Why would someone consider pairing the two? What's the catch/cost?
Some other info:
The datastore is a transactional system, which is important in many use cases. The search API is not. For example, you can't put and delete and document in a search index in a single transaction.
The datastore has a lot in common with a NoSql DB like Cassandra, while the search API is really a textual search engine, very similar to something like Lucene. If you understand how a reverse index works, you'll get a better understanding of how the search API works.
A very good reason to combine usage of the datastore API and the search API is that the datastore makes it very difficult to do some types of queries (e.g. free text queries, geospatial queries) that the search API handles very easily. Thus, you could store your main entities in the datastore, but then use the search API if you need to search in ways the datastore doesn't allow. Down the road, I think it would be great if the datastore and search API were more tightly integrated, for example by letting you do free text search against indexed Text fields, where app engine would automatically create a search Document Index behind the scenes for you.
The key difference is that with the Datastore you cannot search inside entities. If you have a book called "War and peace", you cannot find it if a user types "war peace" in a search box. The same with reviews, etc. Therefore, it's not really an option for you.
The most serious con of Search API is Eventual Consistency as stated here:
https://developers.google.com/appengine/docs/java/search/#Java_Consistency
It means that when you add or update a record with Search API, it may not reflect the change immediately. Imagine a case where a user upload a book or update his account setting, and nothing changes because the change hasn't gone to all servers yet.
I think Search API is only good for one thing: Search. It basically acts as a search engine for your data in Datastore.
So my advice is to keep the data in datastore that user expects immediate result, and use Search API to search the data that user won't expect immediate result.
The Datastore only provides a few query operators (=, !=, <, >), doing nested filters and multiple inequalities would either be costly or impossible (timeouts) and search results may give a lot of False Positives. You can do partial string search by tokenizing but this will bloat your entity. Best way to get through these limitations is using Structured Properties and/or Ancestor Queries.
Search API on the other hand runs a Full Text search on Search Documents, which is faster and more accurate than NDB queries without relying on tokenized data. Downside is it relies on data staying up to date.
Use Datastore to process your data (create, update, delete), then run a function to put these data as documents and cluster using indexes, then run the searches using the Search API.
I am using the displaytag for the pagination purpose.
Now from the DB, I have a millions of records, to go one from the other page, its taking a quite longer time.
Is there a way we can cache the objects which needs to be shown, and so that traversing in between the pages can be faster.
Requirement : We are querying and displaying the number of files in the directory under Linux environment. each folders has thousands of files..
How are your reading from DB? It would be good to see some more from your implementation.
As a general guideline:
If you read all your data into a list from the DB and only display a page, you will be wasting resources (processing and memory). This can kill your app. Try an approach that will just go for the page you're needing.
If you are using a framework like Hibernate, you can implement caching and paging without much trouble.
If you are using direct JDBC, you will have to limit registers in your query. Here the proper technique might depend on the Database Engine you're using. Please provide this information.
Be aware that your problem might be the amount of read information rather than a caching problem (just depends on the implementation).
As a sample, in Oracle, you would need to know the page and the pagesize. With both, you could limit the query with "where rownum < pagesize * page" (or something similar depending on how you index, and navigate to the first register you need with the absolute(int) method of Resultset. On other Engines it might be more efficient.
Now, if you're paginating with some framework, normally they support some implementation of a "DataProvider" so you can control how to fetch results for each page.
I am using Spring Security with ACLs to secure the documents in my application. On the other hand I use Hibernate Search (on top of lucene) to search for the documents. This search also support paging. (Documents are only meta data of documents stored in a Database.)
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(entityManager);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory().buildQueryBuilder().forEntity(Document.class).get();
Query query = queryBuilder.keyword().onFields(fieldNames.toArray(new String[0])).matching(searchQuery)
.createQuery();
FullTextQuery fullTextQuery = fullTextEntityManager.createFullTextQuery(query, Document.class);
fullTextQuery.setFirstResult(pageable.getFirstItem());
fullTextQuery.setMaxResults(pageable.getPageSize());
Now I have to combine the paging with the ACLs. The only idea I have at the moment, is to remove the paging form the FullTextQuery, read all search result documents, filter them by there ACLs and then do the paging by hand. But I don't like that solution, because it loads all the documents, instead of only the one for the page.
Does anybody have a better idea?
If your ACL is not too complex, that is you have a small, finite number of levels, then I suggest to Use Filter and Bitset to implement it.
And here you'll find additional examples ACL implementation with Filters
http://java.dzone.com/articles/how-implement-row-level-access
Here you'll find a cached bitset filter implementation which has been in production for at least 5 years (it's my open source webapp for a searchable parallel text corpus)
Look for the addSourceFilter method
http://code.google.com/p/hunglish-webapp/source/browse/trunk/src/main/java/hu/mokk/hunglish/lucene/LuceneQueryBuilder.java
I have hit the same problem too and I don't think there is a simple answer.
I think there are only two solutions. The one you have suggested which has performance problems you've described as you have to load the documents and resolve the ACL for each result and then do your own paging. The alternative is to push this work to the indexing side and index your ACL in Lucene. This gives you the search performance, hiding the results which a user can't see by adding filter terms based on the current user/group/permissions/roles but at the expense of maintaining the index with ACL information. If your ACL is simple then this may be an option. If your ACL is hierarchical then it's still an option but more complicated. Its also tricky to keep your index upto date with the ACL.
The fact that you are starting to look into this sort of functionality may indicate that you are beginning to stretch your Database/Hibernate/Lucene solution. Maybe a content repository like Jackrabbit may be a better fit? I guess this is probably a step too far but it may be worth taking a look to see how it does it. Alternatively take a look at SOLR, particularly this issue which describes what a thorny problem it is.
Here is my ACL implementation with complex User/Group/Role hierarchical ACL system using pure Lucene queries (on top of Hibernate Search).