Historically data search in SQL database - java

I have a use case to search normalized SQL database for given criteria in historical database of more than few million records. Using StoredProcedure to join normalized table is solving the search but performance is very slow.
Is there any alternate where we take the data in to memory and perform search.
Would like to know approach to solve problem.

You could setup Elastic search that will cache frequently executed searchs

Use the APACHE Module named SOLR that has ability to handle big data with faceted search.
https://lucene.apache.org/solr/

Related

Appengine Search API vs Datastore

I am trying to decide whether I should use App-engine Search API or Datastore for an App-engine Connected Android Project. The only distinction that the google documentation makes is
... an index search can find no more than 10,000 matching documents.
The App Engine Datastore may be more appropriate for applications that
need to retrieve very large result sets.
Given that I am already very familiar with the Datastore: Will someone please help me, assuming I don't need 10,000 results?
Are there any advantages to using the Search API versus using Datastore for my queries (per the quote above, it seems sensible to use one or the other)? In my case the end user must be able to search, update existing entries, and create new entities. For example if my app is a bookstore, the user must be able to add new books, add reviews to existing books, search for a specific book.
My data structure is such that the content will be supplied by the end user. Document vs Datastore entity: which is cheaper to update? $$, etc.
Can they supplement each other: Datastore and Search API? What's the advantage? Why would someone consider pairing the two? What's the catch/cost?
Some other info:
The datastore is a transactional system, which is important in many use cases. The search API is not. For example, you can't put and delete and document in a search index in a single transaction.
The datastore has a lot in common with a NoSql DB like Cassandra, while the search API is really a textual search engine, very similar to something like Lucene. If you understand how a reverse index works, you'll get a better understanding of how the search API works.
A very good reason to combine usage of the datastore API and the search API is that the datastore makes it very difficult to do some types of queries (e.g. free text queries, geospatial queries) that the search API handles very easily. Thus, you could store your main entities in the datastore, but then use the search API if you need to search in ways the datastore doesn't allow. Down the road, I think it would be great if the datastore and search API were more tightly integrated, for example by letting you do free text search against indexed Text fields, where app engine would automatically create a search Document Index behind the scenes for you.
The key difference is that with the Datastore you cannot search inside entities. If you have a book called "War and peace", you cannot find it if a user types "war peace" in a search box. The same with reviews, etc. Therefore, it's not really an option for you.
The most serious con of Search API is Eventual Consistency as stated here:
https://developers.google.com/appengine/docs/java/search/#Java_Consistency
It means that when you add or update a record with Search API, it may not reflect the change immediately. Imagine a case where a user upload a book or update his account setting, and nothing changes because the change hasn't gone to all servers yet.
I think Search API is only good for one thing: Search. It basically acts as a search engine for your data in Datastore.
So my advice is to keep the data in datastore that user expects immediate result, and use Search API to search the data that user won't expect immediate result.
The Datastore only provides a few query operators (=, !=, <, >), doing nested filters and multiple inequalities would either be costly or impossible (timeouts) and search results may give a lot of False Positives. You can do partial string search by tokenizing but this will bloat your entity. Best way to get through these limitations is using Structured Properties and/or Ancestor Queries.
Search API on the other hand runs a Full Text search on Search Documents, which is faster and more accurate than NDB queries without relying on tokenized data. Downside is it relies on data staying up to date.
Use Datastore to process your data (create, update, delete), then run a function to put these data as documents and cluster using indexes, then run the searches using the Search API.

Lucene full text joined with database criteria

I have an app with case records stored in a Derby database, and am using Lucene to full-text index case notes and descriptions. The full-text is relatively static, but some database fields can change daily on many records, to updating Lucene from the database is not a good option.
What I want to do is allow the user to do a full-text query along with some SQL criteria. For example: all cases that have the words "water" and "melon" (the full-text portion) that were edited in the last 2 days, and their "importance" flag is set to "medium" (the SQL portion). (the full-text query could be much more complex, and similarly for the SQL portion).
This involves a "join" (actually "AND") of the full-text results with the DB results, I can either run a full-text search and check each record for DB criteria, or vice versa, depending on whether the full-text or the SQL criteria yield smaller number of records. This is obviously a slow process.
Are there better/faster solutions?
We have done something like this. We used a postgres database but as you said, returning all the matching ids from the database for checking on the index is too slow.
In our case, we only needed to update flags for a document. Fortunately, we could allow us to have a little bit expensive update operation on the database. So we decided to have a blob entry for each flag containing all matching document ids as serialized ArrayList or something alike. When searching, we just needed to retrieve one entry from the database which was fast enough for a million of ids.

How to handle efficient database connection and performance in java?

I have 5000 records as search result and based on product number have to pull the related data associated with product number.that means seperate the 5000 product number and sending to database to pull the data.Creating one query and hiting the database for each product number is not efficient.
I'm looking for some idea to handle this situation.
Note:using hibernate and oracle and java
You got that search result with some query, it might be simpler to reuse that query with a join to retrieve the related data.
Instead of 5000 queries to get the result, you may use the IN clause.
You should probably split it in chunks, however, since such long SQL queries can throw errors, or use a temporary table and do a JOIN. Take a look at this.
maybe you could use a Materialized View and some basic paging? http://docs.oracle.com/cd/A97630_01/server.920/a96567/repmview.htm

Hibernate Feasibility for Single table database

I have to design a web application to retrieve data from a huge single table with 40 columns and several thousands of rows for select query and few rows/columns for updation.
Can you please suggest me that for faster performance, use of Hibernate is feasible or not as i only have single table and do not have any joins ?
Or should i use jdbc dao ?
database : sql server 2008
java 7
If you use Hibernate right, there's no problem in fetching an arbitrarily large result set. Just avoid from queries (use select ... from ... queries) and use ScrollableResults. If you use plain JDBC, you'll be able to get started quicker because Hibernate needs to be configured first, you need to write the mapping file, etc. but later on it might pay off since the code you write will be much simpler. Hibernate is very good at taking the boilerplate out of client code.
If you want to retrieve several thousand records and pagination is not possible then It might be a performance issue. Because hibernate will create an object against everyone and store it in its persistence context. If you create too many objects, it uses up a lot of memory. For these type of operations JDBC is better. For similar discussion see Hibernate performance issues using huge databases

Caching solutions and Querying

Are there any in-memory/caching solutions for java that allow for a form of Querying for specific attributes of objects in the Cache?
I realize this is something that a full blown database would be used for, but I want to be able to have the speed/performance of a cache with the Querying ability of a database.
JBoss Cache has search functionality. It's called JBossCacheSearchable. From the site:
This is the integration package
between JBoss Cache and Hibernate
Search.
The goal is to add search capabilities
to JBoss Cache. We achieve this by
using Hibernate Search to index user
objects as they are added to the cache
and modified. The cache is queried by
passing in a valid Apache Lucene query
which is then used to search through
the indexes and retrieve matching
objects from the cache.
Main JBoss Cache page: http://www.jboss.org/jbosscache/
JBossCacheSearch: http://www.jboss.org/community/docs/DOC-10286
Nowadays the answer should be updated to Infinispan, the successor of JBoss Cache and having much improved Search technology.
Terracotta or GBeans or POJOCache
At first, HSQLDB came to mind, but that's an in-memory relational database rather than an object database. Might want to look at this list. There's a few object databases there, one of which might meet your needs.
Look at db4oat rather lightweight java object database. You can even query the data using regular java code:
List students = database.query( new Predicate(){
public boolean match(Student student){
return student.getAge() < 20
&& student.getGrade().equals(gradeA);}})
(From this article).
Another idea is to use Lucene and a RAMDirectory implementation of Directory to index what you put into your cache. That way, you can query using all the search engine query features which Lucene provides.
In your case, you will probably index the relevant properties of your objects as-is (without using an Analyzer) and query using a boolean equality operator.
Lucene is very lightweight, performant, thread-safe and memory consumption is low.
You might want to check out this library:
http://casperdatasets.googlecode.com
this is a dataset technology. it supports tabular data (either from a database or constructed in code), and you can then construct queries and filters against the dataset (and sort), all in-memory. its fast and easy-to-use. MOST IMPORTANTLY, you can perform queries against ANY column or attribute on the dataset.

Categories