Which is more secured and why JSON or XML - java

I want to implement web services in Java EE whose response is going to be a JSON. This is my first attempt for doing so but before that I just want to know are there any security issues with JSON because everywhere in many blogs I read it is mentioned Like "JSON is not secured in comparison to XML". JSON has several advantages like easy to use, greater speed.
So anyone can explain me the truth whether JSON is really unsecured or not. If so why it is. Please explain with an example.
There are couple old articles on the topic:
JSON vs XML - 2006
concerns about eval
JSON is not as safe as people think it is
Claims only protection for non-public data available via JSON is to use unique urls.
CSRF (Cross Site Request Fogery) - 2007
Array hack highjacking parsing of JavaScript by browser.

This is nonsense. Both, json and xml are just methods for representation of structured data. None of them could be considered as "more secured" or "less secured".

There is no difference security wise between JSON and XML. The "insecurities" referred to by people regarding JSON have to do with the way JSON can (but should never be) parsed in Javascript.
JSON is based on the syntax for coding objects in javascript, so evaluating a JSON result in javascript returns a valid object.
This may open JSON to a variety of javascript injection exploits.
The way to resolve this: don't use eval() to parse JSON in javascript, use a JSON parser and fix any security issues in your server that allow unescaped user generated content in the response.

There is no more secure version. There are other features to consider though:
Example 1
Example 2
It doesn't matter whether you work with java, php or perl. They can all parse json and xml. json is lightweight, though xml can handle more. I would say, start with json unless you really need features of xml.

There is no security benefit to go with one or the other. Both formats are meant to provide a simple protocol for sending data, and neither use encryption by default (you could add something yourself). JSON is generally considered faster, since it takes less characters to assemble. It is also easy to use in JavaScript, since JSON is simply JavaScript Object Notation, and all JavaScript Objects can be converted to or from JSON representation.
Many (especially newer) developers prefer using XML because of its readability. It is structured in such a way that it is much easier for a human to read through it. This of course is what makes it bulkier than JSON, but it is by no means less secure.
Vulnerabilities that can occur as a result of these transfer protocols are a result of bad parsing. Parsers for services on an open network cannot simply assume that the data is valid, since that may lead to attacks such as code injection - but that has nothing to do with JSON or XML.

Both of the formats are representing data therefore there is no difference in security, i have been using JSON for years and never had any security issues.

JSON and XML both serves as a medium for server client communication. So, why are security concerns with one and not other?
The fundamental difference is that JSON(JavaScript Object Notation) as name suggests is very near and dear to JavaScript and hence in design of some JavaScript methods and functionalities, JavaScript treats JSON strings as it's cup of tea and tries to interpret it directly, which give workarounds to attackers to fool JavaScript interpreter to run malicious JavaScript embedded in the string causing vulnerabilities, while XML has to go through a parsing stage to be parsed into an object making it harder to attack with.
2 such JavaScript functionalities are eval() method and tag.
How does these create a security vulnerabilities?
Although web follows the same-origin policy, historically there have been loopholes found to bypass it and malicious sites use them to send cross site request to authenticated user website, breaking the intent of same-origin policy.
Example :
In spite of same-origin policy being in place, web has allowed some tags like <img> <script> to make cross origin GET requests.
let us say you are on a website www.authenticatedwebsite.com, but lured to open a www.malicious.com which has a tag in its html
<script src="www.authenticatedwebsite.com/get-user-order-history" />
Attacker from www.malicious.com use this script tag behavior to access your private data from www.authenticatedwebsite.com.
Now, how does a script tag which is calling a src url, store the url response to a javascript object[to perform operations like POSTing it to malicious site server]?
Here comes the role of JSON and XML proves out to be safer here. As JavaScript understands JSON pretty well, some JavaScript interpreters interprets naked JSON string as a valid JavaScript and run it.
What can running a JSON string potentially do, as it is still not assigned to a variable?
This can be achieved by another fancy hack.
If the returned JSON string represents an array. JavaScript will try to run the constructor of Array class. Now it is possible to override the constructor of Array in JavaScript. You can do something like :
Array = function(){ yourObject = this };
This is basically overriding JavaScript Array constructor, such that whenever JavaScript calls constructor, the current interpreted array is assigned to yourObject, thus giving malicious website access to your private data.
Now, this attack can be used with variants of JSON strings, with more sophisticated hacks.
Although above represents a valid scenario, where JSON can be dangerous as a return format of your GET APIs. This is actually possible in only some versions of some browsers, and as far as I know, all modern versions of famous browsers have mitigated it, but as your user base can be divided across versions of browsers, you need to be careful with GET APIs giving out private information in naked JSON format.
One of the technique used to circumvent this is adding a while(true) in front of your JSON response strings, which will never allow JavaScript interpreter to reach to the actual string. But it creates parsing overhead on your client side.
Another possible mishaps that JSON can cause is use of eval() method on browser client side to parse JSON. As eval() has capability to run script, if your JSON string contains some hidden script which eval() runs and perform dangerous actions what attacker injected script asks it to do, can prove out to be security issues for your system. But as others have mentioned, this attack can be easily prevented by strictly abandoning eval() method as your JSON parser everywhere in client code. This vulnerability plug in is highly important for websites which stores user generated content.

This post does a good job comparing the security issues found in the two data sharing formats. It even has code snippets with explanations of how attacks can be pulled off on the common vulnerabilities like XXE and DTD validation. Then it shows how to remediate/harden against these security issues so that both XML and JSON can be used safely.
https://blog.securityevaluators.com/xml-vs-json-security-risks-22e5320cf529
In terms of security, XML parsers in their default configuration are open to XXE injection attacks and DTD validation attacks, so XML data exchanges need to be hardened if used.
JSON, on the other hand, is in itself secure in its default state, but as soon as JSONP is utilized to bypass Same-Origin Policy restrictions (CSRF attacks), it becomes vulnerable because:
it allows cross-origin exchanges of data.
The author summarizes the comparison here:
In regard to security, processing untrusted Internet-facing requests is one of the most basic functions of an XML or JSON parser. Unfortunately, common XML parsers are not suitable for this purpose in their default configuration; only with hardening to disable external entity expansion and external DTD validation are they safe. Conversely, JSON parsing is almost always safe, so long as the programmer uses modern techniques rather than JSONP. As long as web developers are aware of these security risks and take the steps to defend against them, either option is completely viable in the current web environment.

Related

What is difference b/w org.owasp.esapi.Encoder.encodeForHTML and org.owasp.esapi.Encoder.encodeForJavaScript methods

I Know, we can use encodeForHTML for HTMl and encodeForJavascript for javaScript.
There is a Cross-Site Scripting: "Reflected fortify scan problem" in my code
String errorDesc = HttpServletRequest.getParameter("error_description");
I have to validate this using Encoder but I am confused to use which one should i use between them. As we do not know the return type of HttpServletRequest.getParameter.
1. org.owasp.esapi.Encoder.encodeForHTML
2. org.owasp.esapi.Encoder.encodeForJavaScript
What we have here dear asker is a rather common misunderstanding about the differences between output encoding--which is what you're working with when you look at the Encoder calls, and input validation, which is a completely separate operation that has little to do with the Encoder class.
The Encoder methods you're dealing with here are to be used only when you're presenting data to a user, and only for the correct context. For example, if the application is a "Single Page Application" (SPA) then very likely you're just going to want to ensure that the output is encoded for JavaScript as the client-facing framework will almost certainly be JavaScript.
If you were using an older style of application, then you would encode for HTML anytime you were going to place data between <some_tag> data </some_tag>.
XSS requires you to understand one thing for every variable in your application: Its data flow, from when the value is generated (Server, User, DB, etc.) and understand all of the transformations it might undergo as it traverses to the user and back to the system. If the value starts in the browser, it will enter into some kind of Controller on the backend, and before you process the value you'll whitelist validate it--ESAPI has a validator class--and then if it passes validation you'll ensure that the database will only treat it as data (PreparedStatement, or through use of an ORM framework's utilities.) Best practice is to
Canonicalize the data
Validate against the canonicalized value
If valid, discard the canonicalized value and store the original data
If used properly, the Validator class is defaulted to help you do this.
The methods you're asking about in this question are for instances where user input is being sent back to the browser, either from the database or from a previous request in your session that hasn't yet been persisted.
The main difference is how the output encoding is done. Encoder.encodeForHTML() does HTML entity encoding via the org.owasp.esapi.codecs.HTMLEntityCodec class, whereas Encoder.encodeForJavaScript() uses JavaScript's backslash encoding via org.owasp.esapi.codecs.JavaScriptCodec.
Which one you choose depends on the context of how your "error_description" parameter will be rendered in your application. If it is rendered between HTML tags, use encodeForHTML(), if you are rendering it in purely a JavaScript context, use encodeForJavaScript(). Refer to https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html for a more thorough explanation of all this.

how to separate business logic in RestAssured

We have REST webservice. It operates over JSON data representation. I would like to provide functional testing. I plan to use RestAssured framework. It provides understandable methods for testing correctness of output json.
Example, get("/method").then().assertThat().body("obj.field", equalTo(5));
But one problem arise: if json structure will change, all tests shall be invalid. For example, if field should be renamed to field2 we shall fix all test with occurrences of field. The problem is very similar to web pages testing problem, where we should check presence of some web elements, etc. It was solved introducing by Page Object pattern. Does some similar solution exist for testing of REST api or could you advise some elegant one?
In the example given in your question you validate the entire body of a response object in which case it is probable you will create brittle tests.
However it looks like REST-Assured already provides all the functionality you need to test specific parts of a JSON response:
JSON example
JSON Advanced Examples
Using JSON Path
You can even map objects and then do whatever you wish with the objects constructed, for example validation and manipulation.
See here for more examples.
Just like with an HTML page, one way to write tests less exposed to changes is to use a strategy to locate the target you want to evaluate.
With a web page you would use an XPath query, a CSS Seletor or directly the id to avoid dependecies over the ancestors.
So how to do it with a JSON ?
Well you could use a Regular expression, but it can get really messy or you could use an XPath like query for JSON :
http://goessner.net/articles/JsonPath/
http://defiantjs.com/
So in your case, writing reliable tests is more about what you evaluate rather than the framework you use to do it.
Changes in REST API (especial public) are less frequent than in GUI. When changes in API are introduced they should be marked with new version and do not break old one (in most cases). So keep your tests as simple as possible, without introducing additional patterns, that will have some benefits - you can easily throw them away and write new. Hihger test framework complexity provides hihger maintanence costs. Any way in REST-Assured you may create ResponseSpecification and reuse it in assertions.

how to consume a Restful Web Service (Restful API) in Java

I just want to know the high level steps of the process. Here's my thought on the process:
Assumption: the API returns JSON format
Check the API document to see the structure of the returned JSON
Create a corresponding Java class (ex: Employee)
Make Http call to the endpoint to get the JSON response
Using some JSON library (such as GSON, Jackson) to unmarshall the JSON string to Employee object.
Manipulate the Employee object
However, what if the API returned JSON is changed? it's really tedious task to exam the JSON string every now and then to adjust the corresponding Java class.
Can anyone help me out with this understanding. Thanks
You describe how to consume a json over http API, which is fine since most of the APIs out there are just that. If you are interested in consuming Restful HTTP resources however, one way would be:
Check the API documentation, aka. the media-types that your client will need to support in order to communicate with its resources. Some RESTafarians argue that all media-types should be standardized, so all clients could potentially support them, but I think that goes a bit far.
Watch out for link representations, and processing logic. media-types do not only describe the format of the data, but also how to process them. How to display it if its an image, how to run code that might be part of the message, how to layout onto the screen, how to use embedded controls like forms, etc.
Create corresponding Java classes. If the resources "only" describe data (which they usually do in API context), then simple Java classes will do, otherwise more might be needed. For example: can the representation contain JavaScript to run on the client? You need to embed a JavaScript engine, and prepare your class to do just that.
Make call to a bookmarked URI if you have it. There should be no hardcoded SOAP-like "endpoint" you call. You start with bookmarks and work your way to the state your client need to be in.
Usually your first call goes to the "start" resource. This is the only bookmark you have in the beginning. You specify the media-types you support for this resource in the Accept header.
You then check whether the returned Content-Type matches one of your accepted media-types (remember, the server is free to ignore your preferences), and then you process the returned representation according to its rules.
For example you want to get all the accounts for customer 123456 for which you don't yet have a bookmark to. You might first GET the start resource for account management. The processing logic there might describe a link to go to for account listings. You follow the link. The representation there might give you a "form" in which you have to fill out the customer number and POST. Finally, you get your representation of the account list. You may at this point bookmark the page, so you don't have to go through the whole chain the next time.
Process representation. This might involve displaying, running, or just handing over the data to some other class.
Sorry for the long post, slow day at work :) Just for completeness, some other points the client needs to know about: caching, handling bookmarks (reacting to 3xx codes), following links in representations.
Versioning is another topic you mention. This is a whole discussion onto itself, but in short: some people (myself included) advocate versioning the media-type. Non-backwards compatible changes simply change the media type's name (for example from application/vnd.company.customer-v1+json, to application/vnd.company.customer-v2+json), and then everything (bookmarks for example) continues to work because of content negotiation.
There are many ways to consume RESTful APIs.
Typically, you need to know what version of the API you are going to use. When the API changes (i.e. a different version is exposed) you need to decide if the new functionality is worth migrating your application(s) to the latest and greatest or not...
In my experience, migrating to a new API always requires some effort and it really depends on the value of doing so (vs. not doing it) and/or whether the old API is going to be deprecated and/or not supported by the publisher.

Is a REST request normally JSON output or something else?

A full REST api like with Java's jax-rs contains definitions for defining a path for a resource, uses the full GET, POST, PUT requests.
But typically when I encounter a REST API, it is typically a standard HTTP GET request and the response is a JSON output. It seems like the meat of a real-world REST request is using JSON output but the true definition of REST allows for XML, JSON or other output types.
For example, the twitter API has "JSON" output, they use a GET request and here are some of the URL's:
https://dev.twitter.com/docs/api/1.1/get/search/tweets
And you can still use the "GET" parameters to modify the request. It seems like the twitter 'search/tweets' function is just a simple http request with a well defined URI path that happens to return a JSON response. Is that really REST?
What is a REST api to you?
On Jax-rs
http://en.wikipedia.org/wiki/Java_API_for_RESTful_Web_Services
(Sorry if this is slightly subjective or anecdotal but I believe developers were wondering about this)
REST (Representational State Transfer) is a vague architectural design pattern which was first written about in a paper by Roy Fieldings (aka the guy who created HTTP).
Most of the time (99% of the time) when somebody wants a REST API they mean they want a web API where they send a Request containing an HTTP verb and a URL which describes the location of a Resource that will be acted upon by the HTTP verb. The web server then performs the requested verb on the Resource and sends back a Response back to the user. The Response will usually (depending on the HTTP verb used) contain a Representation of the resulting Resource. Resources can be represented as HTML, JSON, XML or a number of other different mime types.
Which representation used in the response doesn't really indicate whether an API is RESTful or not; it's how well the interface's URLs are structured and how the web server's behaviors are defined using the HTTP Verbs. A properly compliant REST API should use a GET verb to only read a resource, a POST verb to modify a resource, a PUT to add/replace a resource, and a DELETE to remove a resource. A more formal definition of the expected verb behaviors are listed in the HTTP Specification.
REST is (in a nutshell) a paradigm that resources should be accessible through a URI and that HTTP-like verbs can be used to manipulate them (that is to say, HTTP was designed with REST principles in mind). This is, say, in contrast to having one URI for your app and POSTing a data payload to tell the server what you want to achieve.
With a rough analogy, a filesystem is usually RESTful. Different resources live at different addresses (directories) that are easy to access and write to, despite not being necessarily stored on disk in a fashion that reflects the path. Alternatively, most databases are not RESTful - you connect to the database and access the data through a declarative API, rather than finding the data by a location.
As far as what the resource is - HTML, JSON, a video of a waterskiing squirrel - that is a different level of abstraction than adhering to RESTful principles.
REST is a pretty "loose" standard, but JSON is a reasonable format to standardize around. My only major concern with JSON overall is that it doesn't have a method for explicitly defining its character encoding the way XML does. As for the verbs, sticking to meaningful usages of the verbs is nice, but they don't necessarily map one for one in every situation, so as usual, use common sense :)
It can be JSON, it can be XML. JSON is not exactly industry "standard," but many developers like it (including me) because it's lightweight and easy.
That's pretty much it. REST is designed to be simple. The gist of it is that each url corresponds to a unique resource. The format of the resource is commonly json, but can be anything, and is typically determined by the "extension" or "format" part of the url. For example, "/posts/1.json" and "/posts/1.xml" are just two different representations of the same logical resource, formatted in json and xml, respectively. Another common trait of a RESTful interface is the use of HTTP verbs such as GET, PUT, and POST for retrieving, modifying and creating new resources. That's pretty much all there is to it.

Would using a WSDL to generate REST clients be the wrong direction?

I'm out to create a fairly simple web service for intranet use. The service will ultimately be the interface to a DB that will allow me to keep track of what the various internal tools within the company are doing. I'm thinking I want a web service so that various tools (and thus different languages) within the organization can easily update the DB without needing to know the schema.
I've read many of the related REST vs SOAP questions already that come up from searching https://stackoverflow.com/search?q=soap+rest but I'm not sure I've found my answer.
My dilemma seems to be that I want the simplicity of REST while also having the code generation abilities of a WSDL, which seems to imply SOAP. It is of utmost importance to me that various internal tools (JAVA, Perl, Python, PHP, C++) be able to talk to this service and it would seem silly to have to rewrite/maintain the interface layer for each of these languages manually when the WSDL route would do that for me. From what I can tell, if the WS needs to change, you would update the WSDL, regenerate the client stubs, and make any changes necessary to the code that uses the stubs (which would need to be done anyway).
For example - say I have a tool written in JAVA that utilizes a RESTful web service. I imagine that the tool will have specific code for dealing with certain URLs, launching the request, doing something with the response, translating that into some data structure if I'd like, etc. Now lets say I also have a Perl tool doing the same thing. Now I'll need Perl code to do the same, make requests on specific URLs get the responses, do something with them, etc. In each case, and thus in C++ and Python and C#, where code cannot be shared, eventually I'll end up with a wrapper classes/methods that hide a lot of that ugliness from me. I'd much rather call a function on a class that returns my data encapsulated in an object instead of having to worry about the URL, the arguments, the response, etc. Sure, maybe its not a lot of code in any particular place but it starts to add up over time. Multiply that out over each of the tools and now when I make a change to the service I must go update the URLs in each CRUD operation and all that goes along with that. I guess I imagine that with a WSDL that is the aspect that is done for you. Your code simply interacts with the stubs. What the stubs do, who cares? Urls, arguments, responses - if anything changes just regenerate the stubs from the WSDL. If that process causes your code to break, so be it, but at least I didn't also have to update all the code that deals with the specifics of making the request and dealing with the response. Is this really not a problem? Perhaps what I need to do is just create a service and a few clients and see what I'm really up against.
While I'm a fairly seasoned programmer with experience in JAVA, Perl, Python, C++, etc, this is the first time I've considered authoring a WS and don't have prior experience with other WSs, so I'm looking for some guidance. Do I just go with WSDL/SOAP and forget about what everybody is saying about how popular, simple, and useful REST is?
I don't get the code generation issue here.
REST rarely requires any code generation of any kind. It's just HTTP requests with simple JSON (or XML) payloads.
You use the existing HTTP libraries (e.g. Apache Commons, or Python urrlib2). You use existing JSON (or XML) libraries. (the Jersey project has a nice JAXB-JSON conversion, for example).
What's "generated"? Our RESTful library in Java and Python are nearly identical and simply make REST requests through the HTTP library.
class OurService {
def getAResource( String argValue ) {
path = { "fixed", argValue };
URI uri= build_path( path );
return connection.get( uri )
[I've left out the exception handling.]
Are you trying to layer in the complex SOAP interface/implementation separation?
A client "written in JAVA that utilizes a RESTful web service"... A "Perl tool doing the same thing" ... "in C++ and Python and C#".
Correct.
"where code cannot be shared"
Correct. The code cannot be shared. You have to write each client in the appropriate language. Writing some kind of "generator" to create this code from WSDL is (1) a huge amount of work and (2) needless. Each language has unique syntax and unique libraries for making REST requests. But it's so simple and generic that there's hardly anything there.
The canonical example in Python is
class Some_REST_Stub( object ):
def get_some_resource( self, arg ): # This name is from the WSDL
uri = "http://host:port/path/to/resource/%s/" % arg # This path is from the WSDL
data= urllib2.open( uri )
return json.load( data )
This block of code is shorter than the WSDL required to describe it. And it seems easier to change because the name is the method name and the URI is simply a string.
In most languages the client is approximately that simple. I just wrote a bunch of REST client code in Java. It's wordier, but it's generic stuff. Build a request, parse the JSON that comes back. That's it.
A RESTful WSDL declaration buries two pieces of trivial information in a lot of XML.
It provides an "interface name" for the URI.
It documents the meaning of GET, PUT, POST and DELETE by providing Stub class method names.
It doesn't help you write much code, since there isn't much code. Note that all REST requests have the same generic HttpRequest and HttpResponse structure. The request contains generic an Entities. The response, also, contains a generic Entity that must be parsed. There's very little to REST -- the point is to be as simple as possible.
It largely eliminates the need for WSDL since everything is a generic JSONObject or XML URLEncoded and sent as a string.
"I'd much rather call a function on a class that returns my data encapsulated in an object instead of having to worry about the URL, the arguments, the response, etc."
Correct, you'll have to write a "stub" class. It has almost no code, and it will be shorter than the WSDL required to describe it.
"Multiply that out over each of the tools and now when I make a change to the service I must go update the URLs in each CRUD operation and all that goes along with that."
You can either update the stub class in each language for each client. Or you can update the WSDL and then regenerate each client. It seems like the same amount of work, since the URI is trivially encapsulated in the client code.
"I guess I imagine that with a WSDL that is the aspect that is done for you."
I'm unclear on what's "done". Translating the wordy and complex WSDL into a simple stub class? I suppose this could be helpful. Except the WSDL is bigger than the stub class. I'm suggesting that you'll probably find it easier to simply write the stub class. It's shorter than the WSDL.
"Your code simply interacts with the stubs."
Correct.
"What the stubs do, who cares? Urls, arguments, responses - if anything changes just regenerate the stubs from the WSDL."
Sadly, almost none of this requires any real programming. Generating it from WSDL is more work than simply writing it. URI's a strings. Arguments are generic JSONObjects. Responses are generic HttpResponses including a JSONArray. That's it.
"I didn't also have to update all the code that deals with the specifics of making the request and dealing with the response."
Except, there aren't any interesting specifics of making the request. HTTP is simple, generic stuff. GET, POST, PUT and DELETE are all nearly identical.
Fossill,
I recommend that you do not bother to learn SOAP for this. Ws-* has a very high learning curve and the (unnecessary) complexity will likely eat you alive.
Looking at your skill set (Java, Perl, Python, C++) you should be very satisfied with a REST (or at least HTTP-based) approach. And: you'll get results very fast.
As S.Lott said, do not worry about the code generation. You'll not need it.
For questions, I suggest you join rest-discuss on Yahoo groups:
http://tech.groups.yahoo.com/group/rest-discuss/
You usually get immediate help with all things REST there.
Personally, I have yet to see any use case that could benefit from using WS-*.
Jan
The code-generation aspect that you value is a maintenance item, but I question its real worth.
Be it thru a WSDL document or your own grammar documentation for a REST-style implementations, clients have to comply to the published interface. The WS/SOAP (code-generation) development model might have more tools, but I think that's because it's clunky and needs them.
There's no impact in the 'integrateability' of your web service. That's an issue of publishing the formal interface (in whatever form that takes), in either case. And the speed with which you move from an interface change to an implementation change is arguably faster with REST-style services. Firing up (and fighting with) WS-* code generation tools takes time.
FYI - REST does have a WSDL-like auto-generation schema definition called WADL. But almost no one uses it.
Apache CXF has Java support for REST clients that gives you the same sorts of 'code generation' advantages, in many cases, as full SOAP.

Categories