Requirements:
REST Api user is able to query List of entities by providing a List/Set/Array of ids (UUIDs)
Endpoint returns List of requested entities. Empty List if nothing is found.
Number of queried ids (UUIDs) is not limited.
My first idea was to use a GET endpoint like this in my RestController:
#GetMapping(path = "{ids}")
public List<FooResponseTO> getFoos(#PathVariable #NotEmpty Set<UUID> ids) {
return someService.getFoos(ids);
}
This idea does not seem to be optimal because URL length is limited (for example in browsers). I think it would be best to move the ids to the request body but that is not recommended for GET requests. Using POST instead of GET seems also wrong since GET is supposed to be used to request data from a specified resource and POST is supposed to be used to send data to create or update a resource.
What is the best way to design the endpoint to meet the requirements?
What is the best way to design the endpoint to meet the requirements?
There isn't a "best" way to do this, today. Just a bunch of different compromises.
In late 2020, the HTTP working group agreed to adopt SEARCH - that method's semantics are going to be extended to cover some "GET with a body" cases; when it has been standardized and vendors start supporting it, that will be your best choice.
In the mean time, your standardized options are GET (with the arbitrary list of URI encoded in the URI, and the liabilities that come from exceeding URI length limits), and POST (losing the advantages of safe semantics, and compromising caching).
Another possibility would be to switch your resource model from fine grained (ask for exactly the identifiers you want) to coarse grained (get pages of identifiers, let the client get the ones that they want). Coarse grained resources are a much better "fit" for large scale applications, because caching.
If you aren't limited to standardized choices (for example, because you control both the client and the server) then you could also consider using an unstandardized SEARCH method-token, or inventing your own method-token to mean precisely what you need it to.
Related
I'm developing this JavaEE REST microservice oriented CQRS + EventSourcing app, I have this entity (Artwork) with many fields and I have to record each update to this entity according to EventSourcing pattern (basically each update creates a new event, the artwork is then rebuild using these events).
My approach basically works, but I'm stuck with a "compliance" towards HTTP standards, basically I want to avoid a "generic" update in which you update the whole entity because it will be a mess to handle each single field update (and consequent event generation).
So this is what I did.
Let's say that I have this entity:
public entity{
int id;
String field1;
String field2;
...
Then I created as many requests as many fields I have to update (not all fields can be updated, such as the ID)
public field1UpdateRequest{
field1 newvalue;
}
and the same for field 2.
These updated are handled using a PUT request, when such a request arrives, it is handled by something like this:
HTTP → Controller→ Service → (DAOS etc.)
So in the controller class I have a PUT http://...//updatefield1 method that accepts field1UpdateRequest objects.
My question is:
Is this right to do? How can I explain that this is right (if it is)? should these requests be PATCH more than PUT? Should a generic PUT request also be included? (Even if I'm scared that this will make the event sourcing part more difficult)?
In a CQRS spproach, it's important to remember that the C stands for Command. Every request to your "write-side" is thus a command. A generic "here is the new value for this resource" request (which is what REST tends to lead to) can be interpreted as a "use this value henceforth" command, but it is a bit of an impedance mismatch with CQRS, because it's a fairly anemic command. There are definitely cases where having that in an API can be called for (and if it's an exceptionally rare request, you may even be able to get away with modeling it as a single "new beginning" event rather than teasing out finer-grained events; this has the cost of shifting some complexity out to consumers of the events).
With that in mind, an alternative approach that updates parts of an object is a little more of a fit with CQRS (though in this case, you are shifting some complexity to requestor, at least if that requestor wants to do wholesale updates). HTTP PUTsounds proper to me: the command is "use this value for this field/component of the entity".
That said, an even more CQRSy API would instead focus on the higher-level activities which motivate a change to the entity. For instance if you're tracking the current owner of the artwork as of when it was sold, you might have a currentOwner and a currentOwnerAcquired field in your artwork entity. When recording a sale, you would want to update both, so a POST /artworks/{artworkId}/transferOwnership endpoint taking something like
{
"transferor": "Joe Bloggs",
"transferee": "Jack Schmoe",
"date": "2021-12-24T00:00:01Z"
}
would allow the update to be a single transaction and allow you to encode the "why" as well as the "what" in your events (which is an advantage of event sourcing).
So in the controller class i have a PUT http://...//updatefield1 method that accepts field1UpdateRequest objects.
Is this right to do?
It might be, but it probably isn't.
Here's the key idea: your REST API is a facade; it supports the illusion that your server is stores and produces documents. In other words, your providing an interface to your data that makes it look like every other site on the web.
The good news: when you do that, you get (for free!) the benefits of a bunch of general purpose work that has already been done for you.
But the cost (of these things that you get for free) is that - for everything to "just work", you need to handle the impedance mismatch between HTTP (which is based on an idiom of documents) and your domain model.
So I send to you messages to "edit your documents", and you in turn figure out how to translate those messages into commands for your domain model.
In HTTP, both PUT and PATCH have remote authoring semantics. Both of those messages mean "make your copy of the document look like my copy". They are the flavor of HTTP messages you would use to (for example) edit the title of an HTML document on your web server.
The semantics are fundamentally anemic. My message tells you how I want your copy of the document to look, your server is responsible for figuring out how to achieve that.
And that's fine when you are working with documents, or using documents as a facade in front of a data model. But matching remote authoring requests with a domain model are a lot harder.
(Recommended reading: Greg Young 2010 on task based user interfaces).
In the case of a domain model, you normally want to send to the server a representation of a command message. HTTP really wants you to deal with command messages in one of two ways:
treat the command message as a document/resource of its own, to be stored on the server (the changes to the domain model are a side effect of storing a new command message)
POST the command message to the resource most directly impacted by the change.
(See Fielding, 2009; it is okay to use POST).
In both cases, the HTTP application itself knows nothing about what's going on at the domain level, it is only concerned with the transfer of documents over the network.
HTTP doesn't really place any constraints on your resource model - if you want to publish all of your information in one document, that's fine. If you want to distribute your information across many documents, that's also fine.
So taking a single entity in your domain, and distributing its information across many resources is fine.
BUT: remember caching. HTTP has simple rules for automatically invalidating previously cached responses; separating the resource you use for reading information from the resource that you use for editing information makes caching harder (caution: caching is already one of the two hard problems).
In other words: trade offs.
For a restfull service, does the noun can be omitted and discarded?
Instead of /service/customers/555/orders/111
Can / should I expose: /service/555/111 ?
Is the first option mandatory or are there several options and this is debatable?
It's totally up to you, I think the nice thing about having the nouns is that it helps you see from the URL what the service is trying to achieve.
Also taking into account that under customer you can have something like below and from the URL you can distinguish between order and quote for a customer
/service/customers/555/quote/111
/service/customers/555/order/111
One of the core aspects of REST is that URLs should be treated as opaque entities. A client should never create a URL, just use URLs that have been supplied by the server. Only the server hosting the entities needs to know something about the URL structure.
So use the URL scheme that makes most sense to you when designing the service.
Regarding the options you mentioned:
Omitting the nouns makes it hard to extend your service if e.g. you want to add products, receipts or other entities.
Having the orders below the customers surprises me (but once again, that's up to you designing the service). I'd expect something like /service/customers/555 and /service/orders/1234567.
Anyway, the RESTful customer document returned from the service should contain links to his or her orders and vice versa (plus all other relevant relationships).
To a certain degree, the "rules" for nameing RESTful endpoints should follow the same naming rules that "Clean Code" for example teaches.
Meaning: names should mean something. And they should say what they mean, and mean what they say.
Coming from there: it probably depends on the nature of that service. If you only can "serve" customers - then you could omit the customer part - because that doesn't add (much) meaningful information. But what if you later want to serve other kinds of clients?
In other words: we can't tell you what is right for your application - because that depends on the requirements / goals of your environment.
And worth noting: do not only consider todays requirements. Step back and consider those "future grow paths" that seem most likely. And then make sure that the API you are defining today will work nicely with those future extensions that are most likely to happen.
Instead of /service/customers/555/orders/111
Can / should I expose: /service/555/111 ?
The question is broad but as you use REST paths to define nested information, that has to be as much explicit as required.
If providing long paths in the URL is a problem for you, as alternative provide the contextual information in the body of the request.
I think that the short way /service/555/111 lacks consistency.
Suppose that /service/555/111 correspond to invoke the service for the customer 555 and the order 111.
You know that. But the client of the API doesn't know necessarily what the paths meaning are.
Besides, suppose now that you wish invoke the invoke the same service for the customer 555 but for the year 2018. How do that now ?
Like that :
/service/555/2018 would be error prone as you will have to add a parameter to convey the last path value and service/555/years/2018 will make your API definition inconsistent.
Clarity, simplicity and consistency matters.
According to me usage of noun is not necessary or comes under any standard,but yes it's usage helps your endpoint to be more specific and simple to understand.
So if any nomenclature is making your URL more human readable or easy to understand then that type or URL I usually prefer to create and keep things simple. It also helps your service consumer who understand the functionality of any service partially by name itself.
Please follow https://restfulapi.net/resource-naming/ for the best practices.
For a restfull service, does the noun can be omitted and discarded?
Yes. REST doesn't care what spelling you use for your resource identifiers.
URL shorteners work just fine.
Choices of spelling are dictated by local convention, they are much like variables in that sense.
Ideally, the spellings are independent of the underlying domain and data models, so that you can change the models without changing the api. Jim Webber expressed the idea this way
The web is not your domain, it's a document management system. All the HTTP verbs apply to the document management domain. URIs do NOT map onto domain objects - that violates encapsulation. Work (ex: issuing commands to the domain model) is a side effect of managing resources. In other words, the resources are part of the anti-corruption layer. You should expect to have many many more resources in your integration domain than you do business objects in your business domain.
Resources adapt your domain model for the web
That said, if you are expecting clients to discover URIs in your documentation (rather than by reading them out of well specified hypermedia responses), then its going to be a good idea to use URI spellings that follow a simple conceptual model.
I just want to know the high level steps of the process. Here's my thought on the process:
Assumption: the API returns JSON format
Check the API document to see the structure of the returned JSON
Create a corresponding Java class (ex: Employee)
Make Http call to the endpoint to get the JSON response
Using some JSON library (such as GSON, Jackson) to unmarshall the JSON string to Employee object.
Manipulate the Employee object
However, what if the API returned JSON is changed? it's really tedious task to exam the JSON string every now and then to adjust the corresponding Java class.
Can anyone help me out with this understanding. Thanks
You describe how to consume a json over http API, which is fine since most of the APIs out there are just that. If you are interested in consuming Restful HTTP resources however, one way would be:
Check the API documentation, aka. the media-types that your client will need to support in order to communicate with its resources. Some RESTafarians argue that all media-types should be standardized, so all clients could potentially support them, but I think that goes a bit far.
Watch out for link representations, and processing logic. media-types do not only describe the format of the data, but also how to process them. How to display it if its an image, how to run code that might be part of the message, how to layout onto the screen, how to use embedded controls like forms, etc.
Create corresponding Java classes. If the resources "only" describe data (which they usually do in API context), then simple Java classes will do, otherwise more might be needed. For example: can the representation contain JavaScript to run on the client? You need to embed a JavaScript engine, and prepare your class to do just that.
Make call to a bookmarked URI if you have it. There should be no hardcoded SOAP-like "endpoint" you call. You start with bookmarks and work your way to the state your client need to be in.
Usually your first call goes to the "start" resource. This is the only bookmark you have in the beginning. You specify the media-types you support for this resource in the Accept header.
You then check whether the returned Content-Type matches one of your accepted media-types (remember, the server is free to ignore your preferences), and then you process the returned representation according to its rules.
For example you want to get all the accounts for customer 123456 for which you don't yet have a bookmark to. You might first GET the start resource for account management. The processing logic there might describe a link to go to for account listings. You follow the link. The representation there might give you a "form" in which you have to fill out the customer number and POST. Finally, you get your representation of the account list. You may at this point bookmark the page, so you don't have to go through the whole chain the next time.
Process representation. This might involve displaying, running, or just handing over the data to some other class.
Sorry for the long post, slow day at work :) Just for completeness, some other points the client needs to know about: caching, handling bookmarks (reacting to 3xx codes), following links in representations.
Versioning is another topic you mention. This is a whole discussion onto itself, but in short: some people (myself included) advocate versioning the media-type. Non-backwards compatible changes simply change the media type's name (for example from application/vnd.company.customer-v1+json, to application/vnd.company.customer-v2+json), and then everything (bookmarks for example) continues to work because of content negotiation.
There are many ways to consume RESTful APIs.
Typically, you need to know what version of the API you are going to use. When the API changes (i.e. a different version is exposed) you need to decide if the new functionality is worth migrating your application(s) to the latest and greatest or not...
In my experience, migrating to a new API always requires some effort and it really depends on the value of doing so (vs. not doing it) and/or whether the old API is going to be deprecated and/or not supported by the publisher.
The HATEOAS principle "Clients make state transitions only through actions that are dynamically identified within hypermedia by the server"
Now I have a problem with the word dynamically, though I guess it's the single most important word there.
If I change one of my parameters from say optional to mandatory in the API, I HAVE to fix my client else the request would fail.
In short, all HATEOAS does is give the server side developer extreme liberty to change the API at will, at the cost of all clients using his/her API.
Am I right in saying this, or am I missing something like versioning or maybe some other media-type than JSON which the server has to adopt?
Any time you change a parameter from optional to mandatory in an API, you will break consumers of that API. That it is a REST API that follows HATEOAS principles does not change this in any way. Instead, if you wish to maintain compatibility you should avoid making such changes; ensure that any call made or message sent by a client written against the old API will continue to function as expected.
On the other hand, it is also a good idea to not write clients to expect the set of returned elements to always be identical. They should be able to ignore additional information given by a server if the server chooses to provide it. Again, this is just good API design.
HATEOAS isn't the problem. Excessively rigid API expectations are the problem. HATEOAS is just part of the solution to the problem (as it potentially relieves clients from having to know vast amounts about the state model of the service, even if it doesn't necessarily make it straight-forward).
Donal Fellows has a good answer, but there's another side to the same coin. The HATEOAS principle doesn't have anything to say itself about the format of your messages (other parts of REST do); instead, it means essentially that the client should not try to know which URI's to act upon out of band. Instead, the server should tell the client which URI's are of interest via hyperlinks (or forms/templates which construct hyperlinks). How it works:
The client starts at state 0.
The client requests a well-known resource.
The server's response moves the client to a new state N. There may be multiple states achievable at this point depending on the response code and payload.
The response includes links (or forms/templates) which tell the client, in band, the set of potential next states.
The client selects one of the potential next states by issuing a method on a URI.
Repeat 3 through 5 to states N+1 and beyond until the client's application needs are met.
In this fashion, the server is free to change the URI that moves the client from state N to state N+1 without breaking the client.
It seems to me that you misunderstood the quoted principle. Your question suggests that you think about the resources and that they could be "dynamically" defined. Like a mandatory property added to certain resource type at the application runtime. This is not what the principle says and this was correctly pointed out in other answers. The quoted principle says that the actions within the hypermedia should be dynamically identified.
The actions available for a given resource may change in time (e.g. because someone added/removed a relationship in the meantime) and there may be different actions available for the same resource but for different users (e.g. because users have different authorization levels). The idea of HATEOAS is that clients should not have any assumptions about actions available for certain resource at any given time. The client should identify available actions each time it reads that resource.
Edit: The below paragraph may be wrong. See the comments for discussion about it.
On the other hand clients may have expectation for the data available in the resource. Like that a book resource must have a title and that it there may be links to the book's author or authors. There is no way of avoiding the coupling introduced by these assumptions but both service providers and clients should use backward-compatibility and versioning techniques to deal with it.
We are currently working on a very simple Webapp, and we would like to "obfuscate" ( what would be the right term? ) or encode somehow the request parameter, so we can reduce the chance an idle user from sending arbitrarily data.
For instance, the url looks like /webapp?user=Oscar&count=3
We would like to have somthing like: /webapp?data=EDZhjgzzkjhGZKJHGZIUYZT and have that value decoded in the server with the real request info.
Before going into implementing something like this ourselves ( and probably doing it wrong ) I would like to know if there is something to do this already?
We have Java on the server and JavaScript on the client.
No, don't do this. If you can build something in your client code to obfuscate the data being transmitted back to the server, then so can a willful hacker. You simply can't trust data being sent to your server, no matter what your official client does. Stick to escaping client data and validating it against a whitelist on the server side. Use SSL, and if you can, put your request parameters in a POST instead of GET.
Expansion edit
Your confusion stems from the goal to block users from tampering with request data, with the need to implementing standard security measures. Standard security measures for web applications involve using a combination of authentication, privilege and session management, audit trails, data validation, and secure communication channels.
Using SSL doesn't prevent the client from tampering with the data, but it does prevent middle-men from seeing or tampering with it. It also instructs well-behaved browsers not to cache sensitive data in the URL history.
It seems you have some sort of simple web application that has no authentication, and passes around request parameters that control it right in the GET, and thus some non-technically savvy people could probably figure out that user=WorkerBee can simply be changed to user=Boss in their browser bar, and thus they can access data they shouldn't see, or do things they shouldn't do. Your desire (or your customer's desire) to obfuscate those parameters is naïve, as it is only going to foil the least-technically savvy person. It is a half-baked measure and the reason you haven't found an existing solution is that it isn't a good approach. You're better off spending time implementing a decent authentication system with an audit trail for good measure (and if this is indeed what you do, mark Gary's answer as correct).
So, to wrap it up:
Security by obfuscation isn't
security at all.
You can't trust
user data, even if it is obscured.
Validate your data.
Using secure communication channels (SSL)
helps block other related threats.
You
should abandon your approach and do
the right thing. The right thing, in
your case, probably means adding an
authentication mechanism with a
privilege system to prevent users
from accessing things they aren't
privileged enough to see - including
things they might try to access by
tampering with GET parameters. Gary
R's answer, as well as Dave and Will's comment hit
this one on the head.
If your goal is to "reduce the chance an idle user from sending arbitrarily data," there's another simpler approach I would try. Make a private encryption key and store it in your application server side. Whenever your application generates a url, create a hash of the url using your private encryption key and put that hash in the query string. Whenever a user requests a page with parameters in the url, recompute the hash and see if it matches. This will give you some certainty that your application computed the url. It will leave your query string parameters readable though. In pseudo-code,
SALT = "so9dnfi3i21nwpsodjf";
function computeUrl(url) {
return url + "&hash=" + md5(url + SALT );
}
function checkUrl(url) {
hash = /&hash=(.+)/.match(url);
oldUrl = url.strip(/&hash=.+/);
return md5(oldUrl + SALT ) == hash;
}
If you're trying to restrict access to data then use some kind of login mechanism with a cookie providing a Single Sign On authentication key. If the client sends the cookie with the key then they can manipulate the data in accordance with the authorities associated with their account (admin, public user etc). Just look at Spring Security, CAS etc for easy to use implementations of this in Java. The tokens provided in the cookie are usually encrypted with the private key of the issuing server and are typically tamper proof.
Alternatively, if you want your public user (unauthenticated) to be able to post some data to your site, then all bets are off. You must validate on the server side. This means restricting access to certain URIs and making sure that all input is cleaned.
The golden rule here is disallow everything, except stuff you know is safe.
If the goal it to prevent "static" URLs from being manipulated, then you can simply encrypt the parameters, or sign them. It's likely "safe enough" to tack on an MD5 of the URL parameters, along with some salt. The salt can be a random string stored in the session, say.
Then you can just:
http://example.com/service?x=123&y=Bob&sig=ABCD1324
This technique exposes the data (i.e. they can "see" that xyz=123), but they can not change the data.
There's is an advantage of "encryption" (and I use that term loosely). This is where you encrypt the entire parameter section of the URL.
Here you can do something like:
http://example.com/service?data=ABC1235ABC
The nice thing about using encryption is two fold.
One it protects the data (they user can never see that xyz=123, for example).
The other feature tho is that it's extensible:
http://example.com/service?data=ABC1235ABC&newparm=123&otherparm=abc
Here, you can decode the original payload, and do a (safe) merge with the new data.
So, requests can ADD data to the request, just not change EXISTING data.
You can do the same via the signing technique, you would just need consolidate the entire request in to a single "blob", and that blob is implicitly signed. That's "effectively" encrypted, just a weak encryption.
Obviously you don't want to do ANY of this on the client. There's no point. If you can do it, "they" can do it and you can't tell the difference, so you may as well not do it at all -- unless you want to "encrypt" data over a normal HTTP port (vs TLS, but then folks will wisely wonder "why bother").
For Java, all this work goes in a Filter, that's the way I did it. The back end is isolated from this.
If you wish, you can make the back end completely isolated from this with an outbound filter that handles the URL encryption/signing on the way out.
That's also what I did.
The down side is that it's very involved to get it right and performant. You need a light weight HTML parser to pull out the URLs (I wrote a streaming parser to do it on the fly so it didn't copy the entire page in to RAM).
The bright side is all of the content side "just works", as they don't know anything about it.
There's also some special handling when dealing with Javascript (as your filter won't easily "know" where there's a URL to encrypt). I resolved this by requiring urls to be signed to be specific "var signedURL='....'", so I can find those easily in the output. Not as crushing a burden on designers as you might think.
The other bright side of the filter is that you can disable it. If you have some "odd behavior" happening, simply turn it off. If the behavior continues, you've found a bug related to encryption. It also let developers work in plain text and leave the encryption for integration testing.
Pain to do, but it's nice overall in the end.
You can encode data using base64 or something similar. I would encode the arguments inself in JSON to serialize them.
Something like jCryption ?
http://www.jcryption.org/examples/