Reduce data traffic in JAX-WS web service

Reduce data traffic in JAX-WS web service - java

I am exposing a couple domain objects via a SOAP based web service. Some of my domain objects have a large number of fields. I do not want to include values in my web service request/response unless they are needed.
For example, if I have a Book domain object with fields title, genre, and isbn, if I wanted to use my web service to update the name of a book, I want my request to only include the title field (omitting the other two fields that aren't being updated).
Likewise, I want my web service clients to be able to specify which fields they want to be returned when they load books.
This would allow clients to load the title field thereby reducing the size of the data going across the wire because the fields that aren't needed would not be included in the response.
Does anyone know of any patterns or best practices to deal with this type of requirement?

You touched multiple problems where each deserves separate explanation:
Reducing traffic - reducing traffic usually means reducing roundtrips not reducing payload. Reducing traffic is achieved by implement better operations which do multiple actions instead of exposing CRUD operations.
Reducing payload - if you don't want to transfer whole entity you should use Data transfer objects. Special object transferring only data required for given operation.
Dynamic response - web services are not supposed to do that. Web service has fixed interface defined by WSDL where each message payload is specified by XSD. If you want dynamically change returned data structure you will break this. It doesn't mean it is not possible - you can define that your service operation returns xsd:any = any XML and it will be your duty to prepare returned XML and duty of your client to parse XML.

You can either make the fields optional in the XSD data type, or you can specify that in the changeTitle request you don't expect a Book, but only an ID and a string.
When you invent the changeAttributes request and you have optional fields, you have to decide what a missing field means. It could be clear this field or leave this field untouched.

Related

What is the main reason to create a DTO nowadays? [duplicate]

Why should I use DTOs/Domain objects when I can just put all business classes in a class library, use them in the business logic, then also pass those same business objects on to boundary classes?
UPDATE:
All are great points, thanks for the help. A follow up question:
Where do you typically place these DTOs? Alongside the Domain objects i.e. in the same namespace?
namespace MedSched.Medical
{
public class MedicalGroup
{
//...
}
public class MedicalGroupDTO
{
//...
}
}

The DTO's provide an abstraction layer for your domain model. You can therefore change the model and not break the contract you have for your service clients. This is akin to a common practice in database design where views and procedures become the abstraction layer for the underlying data model.
Serialization - you may over expose data and send bloated messages over the wire. This may be mitigated by using serialization attributes, but you may still have extra information.
Implicit vs. Explicit Contracts - by exposing domain objects you leave it up to the client to interpret their usage since they do not have the full model at their disposal. You will often make updates to domain objects implicitly based on the presence or removal of associations, or blindly accept all changes. DTOs explicitly express the usage of the service and the desired operation.
Disconnected Scenarios - having explicit messages or DTOs would make it easier for you to implement messaging and messaging patterns such as Message Broker, etc.
DDD - pure DDD demands that domain objects are externally immutable, and therefore you must offload this to another object, typically a DTO.

I can think of 2 basic scenarios for using DTOs:
You're creating your business object from data that's incomplete and will fail validation. For example, you're parsing a CSV file or an Excel file from where your business object is created. If you use data directly from these objects to create your business object, it is quite possible to fail several validation rules within the object, because data from such files are prone to errors. They also tend to have a different structure that you have in your final business object: having a placeholder for that incomplete data will be useful.
You're transporting your business object through a medium that is bandwidth intensive. If you are using a web service, you will need to use DTOs to simplify your object before transport; otherwise the CLR will have a hard time trying to serialize all your data.

DTOs are Data Transfer Objects, with Transfer being the key word. They are great when you want to pass your objects across the wire and potentially communicate with a different language because they are "light weight" (ie no business logic.)
If your application isn't web service based, DTOs aren't really going to buy you anything.
So deciding to use or not use DTOs should be based on the architecture of your application.

There are times when the data you want to pass around doesn't map exactly to the structure the business objects, in which case you use DTOs.
Ex: when you want to pass a subset of the data or a projection.

How to represent similar objects with different data

We have the following situation. We have a couple of repositories that hold documents. We have written front-end services that deal with documents and document data across the different repositories. We have operations that allow you to, among other things, store new documents and retrieve document metadata.
The problem is, there are different types of documents in the repositories that each have different sets of metadata. For example, all documents in one repository have document name, date added, size, ID, document type and document source. Billing documents also have billing account number and customer name. Policy documents have policy number, insured name and agency code. Some special policy documents also have effective date and packet type.
In the second repository, documents have document name, date added, size, type and location. Invoices (which are Billing documents in the other repository) have account number and customer name, but also invoice date. Policy documents have policy number, insured name, agency code, effective date and policy type. Some special policy documents have cancellation date and amount due.
The reality is more complicated, but this represents the basic issue I'm having.
I don't really have control over the existing metadata fields. Those are defined elsewhere and some of it's legacy. Also, these are SOAP web services, but will eventually become RESTful. But for now, they're defined by a WSDL.
So, what's the best way to represent these things that have many similarities, but some differences?
Some of the considerations:
I'd like to shield the client from as much repository-specific info as possible. In a perfect world, the client shouldn't care if the doc is from one repository or another, although the different fields may make this a pipe dream.
I'd like a single newDocument and getDocumentProperties call to accept and return the pertinent data for each type, rather than have individual new and get calls for each different document type.
I could go with one big fat object with all possible fields and an enum to tell them apart, but that means the client has to magically know what fields apply and what don't.
I could go with a specific object for each possible set of document fields, but then the client has to know whether the doc is going to or coming from a particular repository which is more than I want them to know.
For now, I've gone with the best (or worst?) of both worlds, going with a few high-level abstractions (Policy document, Billing document), converting where I can and leaving any unknown or undefined data for that abstraction empty.
But this means that the client still has to know that, for example, for some Billing docs you'll have invoice date, but for others you won't. Or that for docs from one repository you'll have an ID but for the other you'll have location.
Anyway, I'm looking for best practices for dealing with these sorts of similar, but different objects.

So, what's the best way to represent these things that have many similarities, but some differences?
I think the approach to how to represent/model the data depends on your application requirements and there isn't a globally accepted best practice I know of, some (all?) of the options are:
Map document fields with key value pairs
One fat object with every possible field.
Slim hierarchy with classes containing only shared fields.
Slim hierarchy + dynamic meta-data (e.g. BillingDocument only contains shared fields and also contains a map that contains fields unique to this repository)
Complex hierarchy with sub classes to hold the unique fields (e.g. BaseBillingDocument, RepoOneBillingDocument, Repo2BillingDocument)
Some of the considerations:
I'd like to shield the client from as much repository-specific info as possible. In a perfect world, the client shouldn't care if the doc is from one repository or another, although the different fields may make this a pipe dream.
This is a business issue not a technical one, normalise the data by either discarding unnecessary fields, declare them as optional and should be expected to be empty at times, compute missing values if they are derived from other common attributes or live with the fact that you have different sub types of the same document (BillingDocRepo1, BillingDocRepo2)
I'd like a single newDocument and getDocumentProperties call to accept and return the pertinent data for each type, rather than have individual new and get calls for each different document type.
This is almost doable in all representations, inheritance and polymorphism are supported in both REST and SOAP web services and also doable if your using dynamic schema (a map for instance or class with metadata)

Implementing a RESTful service

I'm building a web service to support an Android e-reader app I'm making for our campus magazine. The service needs to return issue objects to the app, each of which has a cover image and a collection of articles to be displayed. I'd like some general input on two strategies I'm considering, and/or some specific help on a few issues I'm having with them:
Strategy 1: Have 2 DB tables, Issues and Articles: The Issues table contains simply an int id, varchar name and varchar imageURI. Articles contains many more columns (headline, content, blurb, etc.), with each article on a separate row. One of the columns is issueID, which points to the issue to which the article belongs. When issue number n is requested, the service first pulls its data from the Issues table and uses it to create a new Issue object. The constructor instantiates a new List<Article> as an instance variable and populates it by pulling all articles with the matching issueID from the Articles table. What I can't figure out with this option is exactly how to execute it at a single endpoint, so that app only has to create one HTTP connection to get everything it needs for the issue (or is this not as important as I think it is?).
Have a single Issues table with the id, name, and imageURI columns, plus a large number of additional text Article1... text Article40 columns. The Articles are packaged into JSONObjects before being uploaded to the server, and these JSONObjects (which will be very long) are stored directly in the database. My worry here is that the text files will be too long, plus I have a nagging suspicion that this strategy isn't in line with best practices (although I can't put my finger on why...)
Also, This being my first web service, and given the requirements specified above, would it be advisable to use the Spring (or some other) framework or am I better off just using JAX-RS?

There are 2 questions here
How to convert your objects to JSON and expose them with a rest service.
How to store/retrieve your data.
To implement your webservices, Jersey is my favorite option. It is the open-source reference implementation of the JSR 311 (JAX-RS). In addition, Jersey uses Jackson to automatically handle the JSON/Object mapping.
To store your data, your second option... is clearly not an option :)
The first solution seems viable.
IMHO, as your application seems tiny, you should not put in place JPA/Hibernate etc.You should simply make one request by hand with a JOIN between Issues and Article, populate the requested Issue then let Jackson automatically convert your object to JSON.

How to transfer data in layer using MVC and 3 tier architecture?

As design-pattern if i am correct we use strings ( normally called business logic status ) to convey the message between layers. But my problem is if a method return string then how to transfer other data objects . For this situation i am currently using http request object (which is global) to transfer data, which makes all the 3 layers dependent on presentation layer so it is not good.
Is there any other way to transfer data between layers?

Passing messages around using strings is considered a bad idea (google "Stringly typed") and definitely not a design pattern. You should create proper objects and pass them between the layers.

Why are data transfer objects (DTOs) an anti-pattern?

I've recently overheard people saying that data transfer objects (DTOs) are an anti-pattern.
Why? What are the alternatives?

Some projects have all data twice. Once as domain objects, and once as data transfer objects.
This duplication has a huge cost, so the architecture needs to get a huge benefit from this separation to be worth it.

DTOs are not an anti-pattern. When you're sending some data across the wire (say, to an web page in an Ajax call), you want to be sure that you conserve bandwidth by only sending data that the destination will use. Also, often it is convenient for the presentation layer to have the data in a slightly different format than a native business object.
I know this is a Java-oriented question, but in .NET languages anonymous types, serialization, and LINQ allow DTOs to be constructed on-the-fly, which reduces the setup and overhead of using them.

"DTO an AntiPattern in EJB 3.0" (original link currently offline) says:
The heavy weight nature of Entity
Beans in EJB specifications prior to
EJB 3.0, resulted in the usage of
design patterns like Data Transfer
Objects (DTO). DTOs became the
lightweight objects (which should have
been the entity beans themselves in
the first place), used for sending the
data across the tiers... now EJB 3.0
spec makes the Entity bean model same
as Plain old Java object (POJO). With
this new POJO model, you will no
longer need to create a DTO for each
entity or for a set of entities... If
you want to send the EJB 3.0 entities
across the tier make them just
implement java.io.Serialiazable

OO purists would say that DTO is anti-pattern because objects become data table representations instead of real domain objects.

I don't think DTOs are an anti-pattern per se, but there are antipatterns associated with the use of DTOs. Bill Dudney refers to DTO explosion as an example:
http://www.softwaresummit.com/2003/speakers/DudneyJ2EEAntiPatterns.pdf
There are also a number of abuses of DTOs mentioned here:
http://anirudhvyas.com/root/2008/04/19/abuses-of-dto-pattern-in-java-world/
They originated because of three tier systems (typically using EJB as technology) as a means to pass data between tiers. Most modern day Java systems based on frameworks such as Spring take a alternative simplified view using POJOs as domain objects (often annotated with JPA etc...) in a single tier... The use of DTOs here is unnecessary.

Some consider DTOs an anti-pattern due to their possible abuses. They're often used when they shouldn't be/don't need to be.
This article vaguely describes some abuses.

The question should not be "why", but "when".
Definitively it's anti-pattern when only result of using it is higher cost - run-time or maintenance. I worked on projects having hundreds of DTOs identical to database entity classes. Each time you wanted to add a single field you ad to add id like four times - to DTO, to entity, to conversion from DTO to domain classes or entities, the inverse conversion, ... You forgot some of the places and data got inconsistent.
It's not anti-pattern when you really need different representation of domain classes - flatter, richer, ...
Personally I start with a domain class and pass it around, with proper checks at the right places. I can annotate and/or add some "helper" classes to make mappings to database, to serialization formats like JSON or XML ... I can always split a class to two if I feel the need.
It's about your point of view - I prefer to look at a domain object as a single object playing various roles, instead of multiple objects created from each other. If the only role an object is to transport data, then it's DTO.

If you're building a distributed system, then DTOs are certainly not an anti pattern. Not everyone will develop in that sense, but if you have a (for example) Open Social app all running off JavaScript.
It will post a load of data to your API. This is then deserialized into some form of object, typically a DTO/Request object. This can then be validated to ensure the data entered is correct before being converted into a model object.
In my opinion, it's seen as an anti-pattern because it's mis-used. If you're not build a distributed system, chances are you don't need them.

DTO becomes a necessity and not an ANTI-PATTERN when you have all your domain objects load associated objects EAGERly.
If you don't make DTOs, you will have unnecessary transferred objects from your business layer to your client/web layer.
To limit overhead for this case, rather transfer DTOs.

I think the people mean it could be an anti-pattern if you implement all remote objects as DTOs. A DTO is merely just a set of attributes and if you have big objects you would always transfer all the attributes even if you do not need or use them. In the latter case prefer using a Proxy pattern.

The intention of a Data Transfer Object is to store data from different sources and then transfer it into a database (or Remote Facade) at once.
However, the DTO pattern violates the Single Responsibility Principle, since the DTO not only stores data, but also transfers it from or to the database/facade.
The need to separate data objects from business objects is not an antipattern, since it is probably required to separate the database layer anyway.
Instead of DTOs you should use the Aggregate and Repository Patterns, which separates the collection of objects (Aggregate) and the data transfer (Repository).
To transfer a group of objects you can use the Unit Of Work pattern, that holds a set of Repositories and a transaction context; in order to transfer each object in the aggregate separately within the transaction.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.