Apache Solr with Hibernate-Java or with Ms SQL Server

Apache Solr with Hibernate-Java or with Ms SQL Server - java

I have Problem with SQl server Performance because of Heavy Calculation query,
so we decided that we put Solr as intermediate and index all data from either Hibernate or Direct from SQl server,
so can anybody suggest/help me that it is possible ?
please suggest any tutorial link for this.

You can use DataImportHandler to transfer data, which you can schedule using DataImportScheduler.
I had the similar problem where SQL Server SP took 12 hours to update relationships between objects (rows), so we ended up using Neo4j (open source graph database), which exactly matched our data model.
We needed object relationships to be reflected in Solr searches, e.g. give me all objects whose name starts with "obj" and whose parent is of type "typ".

Related

Mapping a single row with a BLOB column using iBatis is very slow

Please help me crack this issue, basically a single row is being retrieved from a Oracle db that contains a BLOB column (size of the BLOB is about 350k) and map to a Java object using iBatis 2.5 but the mapping part (result is mapped to a resultmap) is taking around 40 seconds to be complete. Do you maybe know what can be the bottleneck in this situation ?

The easiest way to find a bottleneck on the Oracle side is to examine the wait events of the session running the SQL. Use v$session and or v$active_session_history (if AWR is licesned).
Sometimes "idle" wait events on the Oracle side point to bottlenecks on client or network.

Informix, MySQL and Oracle blob contains

We have an application that runs with any of IBM Informix, MySQL and Oracle, and we are using Java with Hibernate to connect to the database. We will store XML, CSV and other text-based files inside the database (clob column). The entities in Java are byte[] objects.
One feature request to the application is now to "grep" content inside the data. So I need to find all files with a specific content.
On regular char/varchar fields I can use like '%xyz%', but this is not working on byte[] / blobs.
The first approach was to load each entity, cast the byte[] into a string and use the contains method in Java. If the use enters any filter parameters on other (non-clob) columns, I will apply those filters before testing the clob in order to reduce the number of blobs I have to scan.
That worked quite well for 100 files (clobs) and as long as the application and database are on the same server. But I think it will get really slow if I have 1.000.000 files inside the database and the database is not always in the same network. So I think that is not a good idea.
My next thought was creating a database procedure. But I am not quite sure if this is possible for Informix, MySQL and Oracle. And I am not sure if this is possible.
The last but not favored method is to store the content of the data not inside a clob. Maybe I can use a different datatype for that?
Does anyone has a good idea how to realize that? I need a solution for all three DBMS. The application knows on what kind of DBMS it is connected to. So it would be okay, if I have three different solutions (one for each DBMS).
I am completely open to changing what kind of datatype I use (BLOB, CLOB ...) — I can modify that as I want.
Note: the clobs will range from about 5 KiB to about 500 KiB, with a maximum of 1 MiB.

Look into Apache Lucene or other text indexing library.
https://en.wikipedia.org/wiki/Lucene
http://en.wikipedia.org/wiki/Full_text_search
If you go with a DB specific solution like Oracle Text Search you will have to implement a custom solution for each database. I know from experience that Oracle Text search takes significant time to learn and involves a lot of tweaking to get just right.
Also, if you use a DB solution you would receive different results in each DB even if the data sets were the same (each DB would have it's own methods of indexing and retrieving the data).
By going with a 3rd party solution like Lucene -- you only have to learn one solution and results will be consistent regardless of the Db.

What is the best approach for mongodb query with sum and sort

So I'm moving some of my code from sql to mongodb and there are few things that are not yet very clear to me.
Let's say I have the following simple sql query (just an example)
select count(a.id) as count, b_id
from table group by b_id
where c_id=[SOME ID]
group by b_id
order by count desc;
I assume everyone understands what that does.
Now with mongo I can use several approaches, do it all on mongo side, fetch summed results and sort them client side or just get the raw data to the client side and do all the processing there.
What would be the best approach for the query above, to do it all in the database with some internal mongodb mechanism (mapreduce etc) or fetch the collection to the client side and process it there. The dataset in general will be huge but the query can be split to several parts if necessary.
The client is Java based if that matters.

With the upcoming MongoDB Aggregation Framework it's pretty easy to do what you need to do. It's already available in 2.1.x development releases.
If you're stuck to 2.0 or earlier you'll have to look at either the options you mention or schema changes to avoid having to do on the spot aggregation in the first place. For example, it's pretty common in NoSQL to maintain a field or document with the aggregated data as the source data is manipulated. The most common example is maintaining the size of an array as a field :
update({..}, {$push:{array:element}, $inc:{elementCount:1})

You can group data on mongo side using Map/Reduce and then sort them either on the client side or mongo side. You can also find map/reduce example here.

Java MS SQL -> mySQL conversion

I am building an application at work and need some advice. I have a somewhat unique problem in which I need to gather data housed in a MS SQL Server, and transplant it to a mySQL Server every 15 mins.
I have done this previously in C# with a DataGrid, but now am trying to build a Java version that I can run on an Ubuntu Server, but I can not find a similar model for Java.
Just to give a little background
When I pull the data from the MS SQL Server, it always has 9 columns, but could have anywhere from 0 - 1000 rows.
Before inserting into the mySQL Server blindly, I do manipulate some of the data.
I convert a time column to CST based on a STATE column
I strip some characters to prevent SQL injection
I tried using the ResultSet, but I am having issues with the "forward only result sets" rules.
What would be the best data structure to hold that information, manipulate it, and then parse it to insert later into mySQL?

This sounds like a job for PreparedStatements!
Defined here: http://download.oracle.com/javase/6/docs/api/java/sql/PreparedStatement.html
Quick example: http://download.oracle.com/javase/tutorial/jdbc/basics/prepared.html
PreparedStatements allows you to batch up sets of data before pushing them into the target database. They also allow you use the PreparedStatement.setString method which handles escaping characters for you.
For the time conversion thing, I would retrieve the STATE value from the row and then retrieve the time value. Before calling PreparedStatement.setDate, convert the time to CST if necessary.
I dont think that you would need all the overhead that an ORM tool requires.

You could consider using an ORM technology like Hibernate. This might seem a little heavyweight at first, but it means you can maintain the various table mappings for various databases with ease as well as having the power of Java's RegEx lib for any manipulation requirements.
So you'd have a Java class that represents the source table (with its Hibernate mapping) and another Java class that represents the target table and lastly a conversion utility class that does any manipulation of that data. Hibernate takes care of the CRUD SQL for you, so no need to worry about Database specific SQL (as long as you get the mapping correct).
It also lessens the SQL injection problem

Caching of DB data in java

i have got listing screens in my web app that pull quite a heavy of data from oracle database.
Each time the listing screen loads it goes to DB and pull data.
What I want is ,i want to have some caching teching technique that can extract data from DB and keep that in memory and that when any next request is made I should be getting data.and just like DB I should be able to filter out data from that with any sql query,jst that it won't go to DB rather pull data from memory.so that set of extracted data will be just like a view of the table and it should consistently moniter the corresponding tables so that if any update operation is made on d table it should again fetch new set of data from DB and serve.
Is there any API in java to achieve d same?
in ADO.net they hv got something like recordset...i dnt know much about that.
so is there any way out.my app is based on J2EE and oracle as DB.we hv got jboss as d server.Any suggestion is welcome.Thanks.

Try using Ehcache, it supports JDBC caching. And avoid creating custom solutions, if you're not JDBC guru.

You could cache the results of your query in memcached.
When your application modifies the table that you're caching, delete the cached item out of your memcached instances.
I found this quick guide to be useful: http://pragprog.com/titles/memcd/using-memcached

you can store that data into an in-memory dataset.
give this library a try:
http://casperdatasets.googlecode.com
you can iterate and scroll through the results just like a resultset, issue queries on it, sort the data, and create indexes to optimize searches - and its all memory-based.

I have 2 options for this
1) Jboss cache and you check all the details at the following link
JBOSS Cache

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.