Indexing external rest api with solr, possible?

Indexing external rest api with solr, possible? - java

This question is maybe a weird one, but my employer has asked me to find out and thus I will.
In our application we use an external REST api to search for some data. This REST api has the possibility of delivering many types of data, but it is only possible to look up one type of data at a time. For example city names and street names. In our app we force the users to choose what data type to look for as they search, but now our users don't want to do this. So if they search for example 'los' they want the result to contain both "Los Angeles" and 'Losing Street'. For this to be possible for us right now, we would have to do two separate searches in the REST API and merge the results.
So instead my employer has read about Solr and is adamant that it is possible to index the REST API so that we use Solr to search for what we want in one search request. I am not so sure. Is it possible, and is it feasible?

Yes definitely possible to come up a solution for the requirement specified above. Basically solr is a full text search engine, and all the fields are indexed in solr by default. One can carryout different type of operation on these fields through analyzers and tokenizers combinations. You can map all the searchable field to one specific field(which are called copy fields i.e like city name and street name -> text name) and operate your search on this one field to get result as desired.
solr is RESTful search engine, and it serves data in xml and optional JSON format. Its really useful platform to operate over huge data and doesn't help mush over analytics part like calculations.
Few of the benefits include auto-suggest, highlighting, facets, synonym search, n-gram search, auto-correct etc.

I think you should send a feature request to the REST API maintainer to support a composite search.
The only thing you can do to download the whole database from the REST API, and create an own database which you can index and search after that with your custom queries, and which you have to keep in sync with the REST API. I don't think you want to do that. It will work, but so called REST APIs usually don't decouple clients from the implementation of the service with links and semantic annotations. So I am afraid it will break easily by any change of the API.
Afaik Solr is a storage solution which supports full-text search and has a REST interface.
Solr is a standalone enterprise search server with a REST-like API.
You put documents in it (called "indexing") via XML, JSON, CSV or
binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV
or binary results.

You should have no trouble posting the data from the REST API to Solr using the Data Import Handler (DIH), Solr's RESTful interface, or something like Spring Data Solr once you actually have the data. The tricky part is how will you "crawl" the third-part REST API data?
Depending on whether the REST API provider gives you any way to paginate through the data, i.e. chronologically or alphabetically, you may be able to write a program outside of Solr that polls the REST API then stores the data in a local database before posting it to Solr. This will be easier if the REST API provider allows you to retrieve new or changed records updated after a certain time, so that your polling is efficient and only retrieves a small amount of data after the initial full indexing. Some REST providers allow using webhooks to notify your application that they have updated data in their API. This may or may not be feasible depending on the amount of data and whether you can limit it by user account, etc. to only contain what you need.
It's important to store the third party data in a local database outside of Solr, since Solr's index data files are volatile and sometimes need to be deleted after making configuration changes. That way, you can write a process to repost the data from your database to Solr without having to crawl the REST API again.
For handling the polling at regular intervals, you could use something like Apache Camel or Spring Integration along with Quartz Scheduler. Both of those support REST endpoints and you can also take a look at the DIH examples that come with Solr.

Related

Domino java xpage - caching values server-wide

I have a Java XPages application with a REST service that functions as an API for rooms & resources database (getting appointments for specific room, creating etc).
The basic workflow is that an HTTP request is being made to a specific REST action, having the room's mail address in the search query. Then in the java code I'm iterating over all documents from the rooms & resources database, until I find a document with the InternetAddress field with the searched mail address.
This isn't as fast as I would like it to be, and there are multiple queries like this being made all the time.
I'd like to do some sort of caching in my application, that when one room is found once, it's document UID is being stored in a server-wide cache so next time a request is made for this mail address, I can directly go to the document using getDocumentByUNID(), which I think should be way faster than searching over the entire database.
Is it possible to have such persistent lookup table in Java XPages without having any additional applications, while keeping it as fast as possible? A hash table would be perfect for this.
To clarify: I don't want caching in a single request, because I'm not doing more than one database lookups in a single query, I'd want to keep the caching server-wide, so it would be kept between multiple requests.

Yes, it is possible to store persistent data. What you are looking for is called an application scoped managed bean.

System Design: How can I design a RESTful API that allows querying of results asynchronously

I need to build a /search API that allows someone to send a POST, and retrieve an ID that can be queried later via a seperate /results API.
I've looked at Spring methods:
DeferredResult
#Async
but neither seem to demonstrate returning an ID from a search. I need to have a system that can remember the ID and reference it when someone calls the /results API to retrieve specific results for a search.
Are there any examples of a Spring application doing this

You must remember that Restful services are stateless, therefore It won't be a good practice keeping your search results states in the server.
One solution could be storing your search states on a Database (SQL/NoSQL) and using the Spring Cache support to improve response times.
When an user requests a new search using /search, on the server you must generate the ID, prepare your results and persist it on the database, then you send the new ID to the client. Later the client must request its results using /results/{searchId}.
Please let me know if you'll use this possible solution and I'll share you an example on Github

Spring Data Rest: How to search by another object's key?

In Spring-Data-Rest an object's ID is the URI returned with the object. For example, if I have a User, it might have a URI like:
http://example.com/users/1
In an authentic REST api, this URI is the id of the object, you are not supposed to use just '1' as the id.
Give that, how do I search for all Orders that belong to that User?
http://example.com/orders/search/findByUser?user={{XXX}}
Specifically, what do I use for {{XXX}}?
I know I could do the opposite search:
http://example.com/users/1/orders
But in my case I need to search for matching jobs so I can add additional parameters which are also keys.
I can export /orders/search/findByUser by creating this function definition on OrderRepository:
List findByUser(User user);
and findByUser will be exported by Spring-Data-REST, but how do I specify the User as a GET parameter?
Again, I'm specifically looking for the pure REST solution, since the Spring Data Rest project is trying to encourage purity.

You might take a look at the Query annotation of Spring Data. It enables you to execute custom queries without the need of a custom controller.
Edit:
Query parameters are a good way to filter a resource by simple properties. As SDR serializes all complex types as relations, it is even clearer that filtering only applies to the remaining (simple) properties.
If you have only one relation, you correctly mentioned the way of doing the 'inverse' search as you called it by using /users/1/orders.
If you want to search by multiple relations I suggest you define a seperate search (sub-)resource and perform the search by issuing a POST-request to this resource.
For example:
POST /orders/search
{
"user": "http://example.org/users/1",
...
}
This way, SDR will correctly translate the URIs into entities. However, I think you will need to use a custom controller here but it should be possible to still use the Spring Data repository and provide the user and additional entities as parameter.
For further information, see the following SO-questions:
How to design RESTful search/filtering?
RESTful URL design for search
Edit2:
Addressing the point that using a POST for searching violates the REST spec:
REST was designed to be simple. One of the key advantages of REST is that you are not forced to do anything. You can adapt the spec until it fits your needs. Of course, this could mean that your API is less RESTful but you should consider if it is worth it to strictly stick to the spec if it introduces an unnecessary overhead for the consumers of your API.
Of course you can design the above the idea to fully meet the REST spec. This would involve creating a seperate search entity, persisting it to the database and later retrieve the result of the search with a call to a subresource like /result or something like that. However, the question is, if it is worth it.
In your concrete example, I would just require the client to parse the ID out of the link and provide it as query parameter. If you are scaling up your application later, you could introduce a feature like named searches and apply the above mentioned solution.

If you use a controller, like it seems to be your case, you can pass it any parameter(s) you consider necessary. Take a look at this question:
Spring Data REST custom query integration

See https://jira.spring.io/browse/DATAREST-502
Depending of your version of Spring Data, it would work as you want or not.
By the way, I still think the POST should be an option too, it would be much cleaner.

Implementing a RESTful service

I'm building a web service to support an Android e-reader app I'm making for our campus magazine. The service needs to return issue objects to the app, each of which has a cover image and a collection of articles to be displayed. I'd like some general input on two strategies I'm considering, and/or some specific help on a few issues I'm having with them:
Strategy 1: Have 2 DB tables, Issues and Articles: The Issues table contains simply an int id, varchar name and varchar imageURI. Articles contains many more columns (headline, content, blurb, etc.), with each article on a separate row. One of the columns is issueID, which points to the issue to which the article belongs. When issue number n is requested, the service first pulls its data from the Issues table and uses it to create a new Issue object. The constructor instantiates a new List<Article> as an instance variable and populates it by pulling all articles with the matching issueID from the Articles table. What I can't figure out with this option is exactly how to execute it at a single endpoint, so that app only has to create one HTTP connection to get everything it needs for the issue (or is this not as important as I think it is?).
Have a single Issues table with the id, name, and imageURI columns, plus a large number of additional text Article1... text Article40 columns. The Articles are packaged into JSONObjects before being uploaded to the server, and these JSONObjects (which will be very long) are stored directly in the database. My worry here is that the text files will be too long, plus I have a nagging suspicion that this strategy isn't in line with best practices (although I can't put my finger on why...)
Also, This being my first web service, and given the requirements specified above, would it be advisable to use the Spring (or some other) framework or am I better off just using JAX-RS?

There are 2 questions here
How to convert your objects to JSON and expose them with a rest service.
How to store/retrieve your data.
To implement your webservices, Jersey is my favorite option. It is the open-source reference implementation of the JSR 311 (JAX-RS). In addition, Jersey uses Jackson to automatically handle the JSON/Object mapping.
To store your data, your second option... is clearly not an option :)
The first solution seems viable.
IMHO, as your application seems tiny, you should not put in place JPA/Hibernate etc.You should simply make one request by hand with a JOIN between Issues and Article, populate the requested Issue then let Jackson automatically convert your object to JSON.

Generate Search SQL from HTTP GET request parameters

We have a Java web app with a hibernate backend that provides REST resources. Now we're facing the task to implement a generic search that is controlled by the query parameters in our get request:
some/rest/resource?name_like=foo&created_on>=2012-09-12&sort_by_asc=something
or similar.
We don't want to predefine all possible parameters(name, created_on,
something)
We don't want to have to analyze the request String to pick up control characters (like >=)
nor do we don't want to implement our own grammar to reflect things like _eq _like _goe and so on (as an alternative or addition to control characters)
Is there some kind of framework that provides help with this mapping from GET request parameters to database query?
Since we know which REST resource we're GETing we have the entity / table (select). It probably will also be necessary to predefine the JOINs that will be executed in order to limit the depths of a search.
But other than that we want the REST consuming client to be able to execute any search without us having to predefine how a certain parameter and a certain control sequence will get translated into a search.
Right now I'm trying some semi automatic solution building on Mysemas QueryDSL. It allows me to predefine the where columns and sort columns and I'm working on a simple string comparison to detect things like '_like', '_loe', ... in a parameter and then activate the corresponding predefined part of the search. Not much different from an SQL String except that it's SQL injection proof an type save.
However I still have to tell my search object that it should be able to potentially handle a query "look for a person with name like '???'". Right now this is okay as we only consume the REST resource internally and isolate the actual search creation quite well. If we need to make a search do more we can just add more predefinitions for now. But should we make our REST resources public at some time in the future that won't be so great.
So we're wondering, there has to be some framework or best practice or recommended solution to approaching this. We're not the first who want this. Redmine for example offers all of its resource via a REST interface and I can query at will. Or facebook with its Graph API. I'm sure those guys didn't just predefine all possibilities but rather created some generic grammar. We'd like to save as much as possible on that effort and use available solutions instead.
Like I said, we're using Hibernate so an SQL or HQL solution would be fine or anything that builds on entities like QueryDsl. (Also there's the security issue concerning SQL injection)
Any suggestions? Ideas? Will we just have to do it all ourselves?

From a .NET perspective the closest thing I can think of would be a WCF data service.
Take a look at the uri-conventions specified on the OData website. There is some good information on the section on 4.5 Filter System Query Option. You'll notice that a lot of the examples on this site are .NET related, but there are other suggestions for getting this to work with Java.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.