we are implementing a global search that we can use to query all entities in our application, let's say cars, books and movies. Those entities do not share common fields, cars have a manufacturer, books have an author and movies have a director, for example.
In a global search field I'd like to search for all my entities with one query to be able to use pagination. Two aproaches come to my mind when thinking about how to solve this:
Query one index after another and manually merge the result. This means that I have to implement pagination myself.
Add common fields to each item like name and creator (or create an interface as shown here Single return type in Hibernate Search).In this case I can only search for fields in my global search that I map to the common fields.
My question is now: Is there a third (better) way? How would you suggest to implement such a global search on multiple indices?
Query one index after another and manually merge the result. This means that I have to implement pagination myself.
I definitely wouldn't do that, as this will perform very poorly, especially with deep pagination (page 40, etc.).
Add common fields to each item like name and creator (or create an interface as shown here Single return type in Hibernate Search).In this case I can only search for fields in my global search that I map to the common fields.
That's the way. You don't even need a common interface since you can just target multiple fields in the same predicate. The common interface would only help to target all relevant types: you can call .search(MyInterface.class) instead of .search(Arrays.asList(Car.class, Book.class, Movie.class)).
You can still apply predicates to fields that are specific to each type; it's just that fields that appear in more than one type must be consistent (same type, etc.). Also, obviously, if you require that the "manufacturer" (and no other field) matches "james", Books and Movies won't match anymore, since they don't have a manufacturer.
Have you tried it? For example, this should work just fine as long as manufacturer, author and director are all text fields with the same analyzer:
SearchResult<Object> result = searchSession.search( Arrays.asList(
Car.class, Book.class, Movie.class
) )
.where( f -> f.simpleQueryString()
.fields( "manufacturer", "author", "director" )
.matching( "james" ) )
.fetch( 20 );
List<Object> hits = result.hits(); // Result is a mix of Car, Book and Movie.
One approach would be to create a SQL view (SearchEntry?) that combines all of the tables you want to search. This allows you to alias your different column names. It won't be very good for performance but you could also just create one big field that is a concatenation of different searchable fields. Finally, include a "type" field that you tie back to your entity.
Now you can query everything in one go and use the type/id to tie back to a specific entity that the "search" data was initially pulled from.
I have a requirement where i need to store different application config data , so just wanted to know the best possible way to to store it.
Currently I am using below fields for my db
id, name ,category ,city,value(text field), int_value, float_value, string_value,date_value,bool_value
value field will be used to store complex data which in turn are json object capable of storing different key value pair.
for eg.
value:
{ "is_enabled" : true,
"list_of_applicable_ids" : ["123","345","567","890"]
}
And the reason I have added different value data type field(int_value, float_value) because it will be easy to query on that fields and index will make this even faster.
So just wanted to know the better approach to store these kind of data in db.
will using only value fields enough for my requirements ?
Frequency of config changes are very less( once or twice in a month)
Take a look a file based databases.
An example would be MongoDB.
Mongo uses a data form that closely resembles JSON (called BSON), and could in your case store the entire config file without the need to predefined it's fields.
I need to change field names in elastic search response (ex. change "title" to "header"). i want to avoid parsing the Json response which take much time.
is there any way to do that?
i'm afraid this might not be available in elasticsearch. you might have to parse the response. consider
Aliasing
One of the things introduced in Apache Solr 4.0 and not available in ElasticSearch right now is the ability to transform result documents. First of all Solr allows you to alias returned fields, so for example you can return field price_usd or price_eur as price depending on your needs. The second thing is the ability to return values returned by functions as a (pseudo) field in the result (or fields). Solr also has the ability to return fields which start with a given prefix (for example all fields starting with price). Apart from the ability to get a function value as a field added to matched documents on the fly other functionalities are not ground breaking, though they can be handy in some cases.
from http://blog.sematext.com/2012/10/01/solr-vs-elasticsearch-part-3-searching/
I have list of nested Json objects which I want to save into cassandra (1.2.15).
However the constraint I have is that I do not know the column family's column data types before hand i.e each Json object has got a different structure with fields of different datatypes.
So I am planning to use dynamic composite type for creating a column family.
So I would like to know if there is an API or suggest some ideas on how to save such Json object list into cassandra.
Thanks
If you don't need to be able to query individual items from the json structure, just store the whole serialized string into one column.
If you do need to be able to query individual items, I suggest using one of the collection types: list, set, or map. As far as typing goes, I would leave the value as text or blob and rely on json to handle the typing. In other words, json encode the values before inserting and then json decode the values when reading.
Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.
We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.
I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.
Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.
While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.
What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).
If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.
Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.
In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.