GET Response having URLs in them

GET Response having URLs in them - java

We are implementing a REST based GET implementation which returns back a list of multiple URIs in the response payload back to the client. Later the client uses each one of those URIs and then does a GET on each individual URI to get back a seperate payload. Aren't URIs returned in Location or Content-Location header only after a new Resource is created by POST.
Is the below kind of implementation violating REST standards?
**Initial Call**
GET /AllURIs
HTTP 200 OK
content-type:applicaton/xml
<URIs>
<URI> /somelocation/1 </URI>
<URI> /somelocation/2 </URI>
<URI> /somelocation/3 </URI>
<URI> /somelocation/4 </URI>
<URI> /somelocation/5 </URI>
<URIs>
**Later Call**
GET /somelocation/1
<NewObject>
.........
</NewObject>

URLs can be returned in scenarios other than posting a new resource, like pagination.
If you have multiple related URLs to any resource, the best way IMO is to add them in Link header instead of returning in the response payload. We have used this approach for pagination urls where we sent next, previous, first and last urls as a part of Link Header
Having said that, if the sole purpose of your REST request is to obtain (GET) a list of URLs and that is how you have designed your resources, then it should be okay to use URLs in response body also.

You should use the absolute URL rather than relative ones. You may use the structure like you propose - it is OK, but you may also consider using the Atom links.

Related

How to set the Cache-Contro for many urls

I am starting on web-security and I have to control the cache on the portal, this portal has many urls. I understand that I need to set the header with this:
response.setHeader("Cache-Control","no-cache,no-store,must-revalidate");
response.setHeader("Pragma", "no-cache");
But my question is: The code above is valid for all the urls that I want to controling (You know the cache) or how I set this attribute for all the urls or for a url in specific?.

Assuming you have access to both request and response objects. You can use one of the following methods of HttpRequest object in your control method to set those response parameters
- getPathInfo()
- getRequestURL()
- getRequestURI()
I mean something like this
if(request.getRequestURL().equals("http://someurl"))
{
//do your stuff
}

Put this code in a web filter and map the filter to all the urls where you want to disable caching.

How to get http response header in apache jena during calling Method FileManager.get().loadModel(url)

I am loading model in apache jena using function FileManager.get().loadModel(url).And I also know that there may be some URLs in HTTP Response Link Header .I want to load model also from the links(URLs) in link header.How to do that ? Is there any inbuilt fuctionality to get access to header and process link header in Response header?

FileManager.get().loadModel(url) packages up reading a URL and parsing the results into a model. It is packing up a common thing to do; it is not claiming to be comprehensive. It is quite an old interface.
If you wanted detailed control over the HTTP handling, see if HttpOp (a lower level) mechanism helps, otherwise do the handling in the application and hand the input stream for the response directly to the parser.
You may also find it useful to look at the code in RDFDataMgr.process for help with content negotiation.

I don't think that this is supported by Jena. I don't see any reason in doing so. The HTTP request is done to get the data and maybe also to get the response type. If you want to get the URLs in some header fields, why not simply use plain old Java:
URL url = new URL("http://your_ontology.owl");
URLConnection conn = url.openConnection();
Map<String, List<String>> map = conn.getHeaderFields();

Anonymize Amazon public URL, decode using a script on Nginx

I have been wondering if its possible to anonymize public URL. When user makes a request with this anonymized public URL, let Nginx decode, fetch and serve the URL.
Example
Public URL http://amazon.server.com/location/file.html
Anonymized URL https://amazon.server.com/09872340-932872389-390643289/983724.html
Nginx decodes 09872340-932872389-390643289/983724.html to location/file.html
Added image below for further clarification. Nginx has a reverse logic to decode, whereas Remote Server has the logic to Anonymize URL.
Question
All I need to know is how would Nginx decode anonymized URL? Nginx got anonymized URL request. There has to be a way to decode it.

This is an answer to the updated question:
Question All I need to know is how would Nginx decode anonymized URL? Nginx got anonymized URL request. There has to be a way to decode it.
Nginx would make a request to a script, e.g., either through proxy_pass or fastcgi_pass et al.
The script could decode the URL and provide the actual URL through a Location HTTP Response Header with a 302 Found HTTP Status.
Nginx would then have the decoded URL stored in the $upstream_http_location variable. It could subsequently be used in another proxy_pass et al within a named location #named, to which you could redirect the processing of the original request from the user through error_page 302 = #named.
In all, each user request would be processed twice within nginx, but it'll all be transparent to the user -- they simply receive the resource through the original URL, with all redirects being done internally within nginx.

Define Anonymize for a URL? You can use any of the same methods as URL shortners such as http://bitly.com. But that is not truely anonymous since there is a definite mapping between the shortened URL and the target public url. If you make this per user based there is still a mapping but it is user based.
Looks like what you are suggesting is a variation on the above scheme where instead of sending the user to the target URL via a redirect you want the your server to actually fetch the content and return to the user. You need to be aware of the linked content in the public URL such as style sheets and images and adjust them accordingly. Many of the standard proxies has this kind of functionality built in. Also take a look at
https://github.com/jenssegers/php-proxy
http://search.cpan.org/~book/HTTP-Proxy-0.304/lib/HTTP/Proxy.pm.
If you are planning to build your own these can serve as a base.

I think what you want to do here is somewhat similar to another question I've answered in the past, where for each request by the client, you effectively want to make two requests to two different upstreams under the hood (first one to an upstream capable of decoding the URL, second one to actually fetch said decoded URL), but, of course, only return one result.
https://serverfault.com/questions/202011/nginx-and-2-upstreams/485044#485044
As mentioned on serverfault, you could use error_page to process another request, after the first one is complete. You could then use $upstream_http_ to make the subsequent request based on the original one, for example, using $upstream_http_location.
You might also want to look into X-Accel-Redirect header, introduced in this context at proxy_ignore_headers.

Watson Content Analytics: How to make web crawler plug-in to get data, sending POST request?

I have WCA 3.5.0 server and I need to get documents from the site, using web crawler.
The problem is in the fact that I have to send a POST request to the site to get some data (initialy my site consists only of a form with some fields and submit button to send the request to the server). So, my POST request body should be something like that:
{"DateFrom":"2000-01-01T00:00:00","DateTo":"2030-01-01T23:59:59","Bundles":[{"Name":"the test name that i passed","Type":-1}],"Company":[],"Transaction":[],"Text":""}
I was thinking about making a a prefetch plugin for a web crawler.
But from the documentation I've found it looks like it is hardly possible:
"The first element ([0]) in the argument array that is passed to your
plug-in is an object of type PrefetchPluginArg1, which is an interface
that extends the interface PrefetchPluginArg. This is the only
argument and the only argument type that is passed to the prefetch
plug-in."
PrefetchPluginArg1 class has only getHTTPHeader(), setHTTPHeader(), getURL(), setURL(), doFetch(), setFetch(),
where:
The getHTTPHeader method returns a String that contains the all of
the content of the HTTP request header that the crawler sends so that
the crawler can download the document.
The getURL method returns the URL (in String form) of a document that
the crawler downloads. You can use this URL to decide if the document
requires additional information in the request header, such as a
cookie.
And it looks like there is no way to change request body.
So, is it realy possible to control POST request body, but not only header, and if it is so, can you please, share some information about the ways of solving this task?

How does doGet() support bookmarks?

Reading below link , I could note that "doGet() allows bookmarks".
http://www.developersbook.com/servlets/interview-questions/servlets-interview-questions-faqs.php : search "It allows bookmarks"
Can anyone tell how and what is the use of it ?

All the parameters of GET request are contained in the url so when you are requesting for a resource using GET request, it can be formed using request URL itself.
Consider an example www.somesite.com.somePage.jsp. This generates a GET request because we are asking for a resource somePage.jsp.
If you are asking for a resource, then it is the GET request.
GET requests are used to retrieve data.
any GET request calls the doGet() method of servlet
GET requests are idempotent, i.e. calling the same resource again and again do not cause any side effects to the resources.
Hence, a GET request can have bookmarks
EDIT :-
As suggested by Jerry Andrews, POST methods do not have the query data unlike GET requests to form the resource properly with the help of only url. Hence they are not bookmarked.

It means that If you bookmark the URL of the servlet that has doGet() implemented, you could always get the same page again when you re-visit. This is very common when you have searches, link for products, news, etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

GET Response having URLs in them - java

You should use the absolute URL rather than relative ones. You may use the structure like you propose - it is OK, but you may also consider using the Atom links.

Related

How to set the Cache-Contro for many urls

How to get http response header in apache jena during calling Method FileManager.get().loadModel(url)

Anonymize Amazon public URL, decode using a script on Nginx

Watson Content Analytics: How to make web crawler plug-in to get data, sending POST request?

How does doGet() support bookmarks?

Categories

Resources