Generate Search SQL from HTTP GET request parameters

Generate Search SQL from HTTP GET request parameters - java

We have a Java web app with a hibernate backend that provides REST resources. Now we're facing the task to implement a generic search that is controlled by the query parameters in our get request:
some/rest/resource?name_like=foo&created_on>=2012-09-12&sort_by_asc=something
or similar.
We don't want to predefine all possible parameters(name, created_on,
something)
We don't want to have to analyze the request String to pick up control characters (like >=)
nor do we don't want to implement our own grammar to reflect things like _eq _like _goe and so on (as an alternative or addition to control characters)
Is there some kind of framework that provides help with this mapping from GET request parameters to database query?
Since we know which REST resource we're GETing we have the entity / table (select). It probably will also be necessary to predefine the JOINs that will be executed in order to limit the depths of a search.
But other than that we want the REST consuming client to be able to execute any search without us having to predefine how a certain parameter and a certain control sequence will get translated into a search.
Right now I'm trying some semi automatic solution building on Mysemas QueryDSL. It allows me to predefine the where columns and sort columns and I'm working on a simple string comparison to detect things like '_like', '_loe', ... in a parameter and then activate the corresponding predefined part of the search. Not much different from an SQL String except that it's SQL injection proof an type save.
However I still have to tell my search object that it should be able to potentially handle a query "look for a person with name like '???'". Right now this is okay as we only consume the REST resource internally and isolate the actual search creation quite well. If we need to make a search do more we can just add more predefinitions for now. But should we make our REST resources public at some time in the future that won't be so great.
So we're wondering, there has to be some framework or best practice or recommended solution to approaching this. We're not the first who want this. Redmine for example offers all of its resource via a REST interface and I can query at will. Or facebook with its Graph API. I'm sure those guys didn't just predefine all possibilities but rather created some generic grammar. We'd like to save as much as possible on that effort and use available solutions instead.
Like I said, we're using Hibernate so an SQL or HQL solution would be fine or anything that builds on entities like QueryDsl. (Also there's the security issue concerning SQL injection)
Any suggestions? Ideas? Will we just have to do it all ourselves?

From a .NET perspective the closest thing I can think of would be a WCF data service.
Take a look at the uri-conventions specified on the OData website. There is some good information on the section on 4.5 Filter System Query Option. You'll notice that a lot of the examples on this site are .NET related, but there are other suggestions for getting this to work with Java.

Related

LDAP search fails for certain characters on some attributes

I have a query which looks something like this:
(|(mail=andrew*)(cn=andrew*)(sn=andrew*)(telephoneNumber=andrew*))
i.e. it takes a search term from a UI and looks for a match against term* across a bunch of attributes.
The user enters andrew in this case and the app adds the wildcard. If the user enters andrew` (trailing back tick) the app looks for andrew`*.
I've noticed that if telephoneNumber is included in the searched attributes the query fails with a javax.naming.InvalidAttributeValueException, if it's excluded then the query works without error.
I'm not particularly interested in the backtick alone, but as it's not a special character in LDAP searches I'm wondering why I get this behaviour and if other characters will produce similar results. Is there going to be something in the schema that explains this if I can figure out how to query it, or will it be something else?
If it matters, accessing via a Spring library in a Java app.

The query probably fails because of attribute value constraints on the telephoneNumber attribute. The syntax for telephoneNumber attributes is described in this RFC. Back tick does indeed seem to be an invalid character in telephoneNumber values.
Now, I might be wrong, but reading your question it appears you are trying to construct filters using string concatenation. Please note that you should never, ever build queries of any sort using string concatenation, especially when parts of the queries come from user input. I'm sure you know this is the case for SQL queries, and it's equally true when you're using LDAP.
Spring LDAP provides two ways to help you build LDAP queries. The preferred approach is the use the LDAP query API documented here and (advanced usage) here. The old deprecated - but still functioning - way to do it is using the filter classes, documented in the old reference documentation here.
Using these utilities you won't need to keep track of which characters need to be encoded and when. You also eliminate any risk for query injection attacks.

Spring Data Rest: How to search by another object's key?

In Spring-Data-Rest an object's ID is the URI returned with the object. For example, if I have a User, it might have a URI like:
http://example.com/users/1
In an authentic REST api, this URI is the id of the object, you are not supposed to use just '1' as the id.
Give that, how do I search for all Orders that belong to that User?
http://example.com/orders/search/findByUser?user={{XXX}}
Specifically, what do I use for {{XXX}}?
I know I could do the opposite search:
http://example.com/users/1/orders
But in my case I need to search for matching jobs so I can add additional parameters which are also keys.
I can export /orders/search/findByUser by creating this function definition on OrderRepository:
List findByUser(User user);
and findByUser will be exported by Spring-Data-REST, but how do I specify the User as a GET parameter?
Again, I'm specifically looking for the pure REST solution, since the Spring Data Rest project is trying to encourage purity.

You might take a look at the Query annotation of Spring Data. It enables you to execute custom queries without the need of a custom controller.
Edit:
Query parameters are a good way to filter a resource by simple properties. As SDR serializes all complex types as relations, it is even clearer that filtering only applies to the remaining (simple) properties.
If you have only one relation, you correctly mentioned the way of doing the 'inverse' search as you called it by using /users/1/orders.
If you want to search by multiple relations I suggest you define a seperate search (sub-)resource and perform the search by issuing a POST-request to this resource.
For example:
POST /orders/search
{
"user": "http://example.org/users/1",
...
}
This way, SDR will correctly translate the URIs into entities. However, I think you will need to use a custom controller here but it should be possible to still use the Spring Data repository and provide the user and additional entities as parameter.
For further information, see the following SO-questions:
How to design RESTful search/filtering?
RESTful URL design for search
Edit2:
Addressing the point that using a POST for searching violates the REST spec:
REST was designed to be simple. One of the key advantages of REST is that you are not forced to do anything. You can adapt the spec until it fits your needs. Of course, this could mean that your API is less RESTful but you should consider if it is worth it to strictly stick to the spec if it introduces an unnecessary overhead for the consumers of your API.
Of course you can design the above the idea to fully meet the REST spec. This would involve creating a seperate search entity, persisting it to the database and later retrieve the result of the search with a call to a subresource like /result or something like that. However, the question is, if it is worth it.
In your concrete example, I would just require the client to parse the ID out of the link and provide it as query parameter. If you are scaling up your application later, you could introduce a feature like named searches and apply the above mentioned solution.

If you use a controller, like it seems to be your case, you can pass it any parameter(s) you consider necessary. Take a look at this question:
Spring Data REST custom query integration

See https://jira.spring.io/browse/DATAREST-502
Depending of your version of Spring Data, it would work as you want or not.
By the way, I still think the POST should be an option too, it would be much cleaner.

Indexing external rest api with solr, possible?

This question is maybe a weird one, but my employer has asked me to find out and thus I will.
In our application we use an external REST api to search for some data. This REST api has the possibility of delivering many types of data, but it is only possible to look up one type of data at a time. For example city names and street names. In our app we force the users to choose what data type to look for as they search, but now our users don't want to do this. So if they search for example 'los' they want the result to contain both "Los Angeles" and 'Losing Street'. For this to be possible for us right now, we would have to do two separate searches in the REST API and merge the results.
So instead my employer has read about Solr and is adamant that it is possible to index the REST API so that we use Solr to search for what we want in one search request. I am not so sure. Is it possible, and is it feasible?

Yes definitely possible to come up a solution for the requirement specified above. Basically solr is a full text search engine, and all the fields are indexed in solr by default. One can carryout different type of operation on these fields through analyzers and tokenizers combinations. You can map all the searchable field to one specific field(which are called copy fields i.e like city name and street name -> text name) and operate your search on this one field to get result as desired.
solr is RESTful search engine, and it serves data in xml and optional JSON format. Its really useful platform to operate over huge data and doesn't help mush over analytics part like calculations.
Few of the benefits include auto-suggest, highlighting, facets, synonym search, n-gram search, auto-correct etc.

I think you should send a feature request to the REST API maintainer to support a composite search.
The only thing you can do to download the whole database from the REST API, and create an own database which you can index and search after that with your custom queries, and which you have to keep in sync with the REST API. I don't think you want to do that. It will work, but so called REST APIs usually don't decouple clients from the implementation of the service with links and semantic annotations. So I am afraid it will break easily by any change of the API.
Afaik Solr is a storage solution which supports full-text search and has a REST interface.
Solr is a standalone enterprise search server with a REST-like API.
You put documents in it (called "indexing") via XML, JSON, CSV or
binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV
or binary results.

You should have no trouble posting the data from the REST API to Solr using the Data Import Handler (DIH), Solr's RESTful interface, or something like Spring Data Solr once you actually have the data. The tricky part is how will you "crawl" the third-part REST API data?
Depending on whether the REST API provider gives you any way to paginate through the data, i.e. chronologically or alphabetically, you may be able to write a program outside of Solr that polls the REST API then stores the data in a local database before posting it to Solr. This will be easier if the REST API provider allows you to retrieve new or changed records updated after a certain time, so that your polling is efficient and only retrieves a small amount of data after the initial full indexing. Some REST providers allow using webhooks to notify your application that they have updated data in their API. This may or may not be feasible depending on the amount of data and whether you can limit it by user account, etc. to only contain what you need.
It's important to store the third party data in a local database outside of Solr, since Solr's index data files are volatile and sometimes need to be deleted after making configuration changes. That way, you can write a process to repost the data from your database to Solr without having to crawl the REST API again.
For handling the polling at regular intervals, you could use something like Apache Camel or Spring Integration along with Quartz Scheduler. Both of those support REST endpoints and you can also take a look at the DIH examples that come with Solr.

Should I allow a SQL WHERE clause as a REST API parameter, available on the internet?

The project I'm working on has a REST API written in JRuby/Java, with an endpoint that hits a MySQL database to retrieve a number of records.
We need to allow the client to filter those records using one or more columns, including boolean checks and range values.
The easiest way we can do this is to add a string parameter to the API, then add it into the SQL statement.
Collectively, the development team agree that this is a bad idea but the alternative is to provide an almost identical syntax for filtering, which is translated into SQL. The allure of the SQL injection parameter is strong.
So my question is, are there any circumstances under which this is a safe thing to do?
In particular, might we consider using the WHERE clause safely if it's been fully parsed beforehand, and identified as such. Or at the very least, checking for certain trigger words such as DROP, SELECT etc.
Also if anyone knows of a good library that could act as a go-between (translating or parsing an external expression into a WHERE clause) that would be great.

The OData and GData protocols already implement this functionality in a safe and standard way. You can find server and client implementations for both, for Ruby, PHP, MySQL etc. Check here for the OData libraries

Leaving aside the SQL injection issue, you'll expose your inner implementation (both the database chosen - MySQL and your table structure) directly in the form of your API.
e.g. if you change to some NoSQL-type implementation at the backend, your public-facing API will break immediately. Similarly if you restructure your database. I wouldn't do this even in an environment in which I wasn't worried about the probability/severity of injection attacks.

Besides the security implications, allowing an arbitrary WHERE clause is a bad idea because it takes the "I" out of "API" -- it's not an interface. The API is supposed free the user of the need to know details of the implementation. Like table and column names.
If clients are interacting with your data by constructing their own WHERE clauses, then you can never change the database. There might be code out there with those statements programmed in. If a bug or new feature required you to alter the DB in a way that would break existing client interactions you'd be stuck. The API should provide the filtering capability and translate requests into calls to backend in a way that lets you change the backend without breaking the API.

There are numerous ORM's for this purpose, especially in ruby (activerecord, sequel)
The most basic thing you need to do is escape the string input, which will pretty much prevent sequel injection if you are doing it properly.
It helps to not directly insert parameters directly into the sequel statement if you dont have to either, instead, check their validity and then map them to logical ones (this isn't always possible). For example, if there is an html dropdown list, and when you submit the form it passes some parameter 'firstitem', map 'firstitem' to an id or otherwise that you will then use to search on, versus using the user supplied version (assuming this mapping doesn't involve the db).

when to use Hibernate vs. Simple ResultSets for small application

I just started working on upgrading a small component in a distributed java application. The main application is a rather complicated applet/servlet combo running on JBoss and it extensively uses Hibernate for its DataAccess. The component i am working on however is very a very straightforward data importing service.
Basically the workflow is
Listen for a network event
Parse the data packet, extract a set of identifiers
Map the identifier set to a primary key in our database
Parse the rest of the packet and insert items in a related table using the foreign key found in step 3
Repeat
in the previous version of this component it used a hibernate based DAL, that is no longer usable for a variety of reasons (in particular it is EOL), so I am in charge of replacing the Data Access layer for this component.
So on the one hand I think i should use Hibernate because that's what the rest of the application does, but on the other i think i should just use regular java.sql.* classes because my requirements are really straightforward and aren't expected to change any time soon.
So my question is (and i understand it is subjective) at what point do you think that the added complexity of using an ORM tool (in terms of configuration, dependencies...) is worth it?
UPDATE
due to the way the DataAccesLayer for the main application was written (weird dependencies) i cannot easily use it, i would have to implement it myself.

If we look into why Spring-Hibernate combination is used?
Because for simple Jdbc operation we have to do lot of operation like getting a connection.
Making a statement and handling resultset.For all these steps there are lot of exception handling.
But with spring hibernate you have to use just this:
public PostProfiles findPostProfilesById(long id) {
List list=getHibernateTemplate().find("from PostProfiles where id=?",id);
return (PostProfiles) list.get(0);
}
And everything is taken care by framework.I hope it will solve you dilemma

I think the answer really depends on your skill set. It would probably take similar amount of time to craft a simple solution involving a handful of tables in either way (Hibernate or raw JDBC) if you are comfortable with both techniques.
As I am pretty comfortable with Hibernate, I'd just choose it as I prefer to working in a higher level and not worrying about things that Hibernate handles for me. Yes, it has its own glitches, but especially for simple data models it does the job, and does it well.
The only few reasons why would I choose plain JDBC would be:
uber-complicated maximum-optimized SQL that is performance critical;
Hibernate being stupid and not being capable to express what I want;
And especially if you say you are already managing other entities with Hibernate, why not keep your code in the same style everywhere?

I think you are better off using JDBC api. From what you describe, the two operations (select foreign key from table, insert into table_2) can easily be executed with a simple Stored Procedure call.
The advantage of using this technique is that you can manage transactions/exceptions within your stored procedure call.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.