Hendry Luk — Sheep in Fence

March 23, 2009

Extensible Query with Specification Pattern

Filed under: Software Development — Hendry Luk @ 11:38 am
Tags: ,

Writing finder methods for data repositories is problematic. The following is an example of pretty simplistic customer-search screen:
Email Address: [________]
Name: [_______]
Age: from [__] to [__] year old
Is terminated: [ ]Yes [ ]No
{Search}

Now how are we going to implement data query behind this search? I’ll go through several alternatives. (Jump to solution if you can’t care less). Let’s start from the simplest approach.

1. Repository Finder Methods

IList<Customer> result = customerRepository.SearchByBlaBlaBloodyBla(email, name, ageFrom, ageTo, isTerminated);

Let me give more context about what I’m working on. I’m developing an application that is intended to be an extensible vertical CRM framework. The client will provide their own new screens/functionalities by writing specific plugin implementation in their own separate assembly, without requiring modification on the framework (Open-Closed Principle).
The approach you just saw above tightly couples the repository API with specific UI design. Any change with the design of the search screen will require the repository API to be reworked. And should we create one repository method for each search screen? Furthermore, when a new search screen or report functionality is plugged into the system by adding new plugin, we need to somehow extend the data repository API to cover each of those specific screen scenarios. This is not an easily extensible architecture.
The upside, this approach is simple and very easy to mock for unit-test. When flexibility is not an issue, I would go for this approach.

2. Lambda-Expression Repository

IList<Customer> result = repository.FindAll<Customer>(
	x => x.EmailAddress == emailAddress && x.IsTerminated == isTerminated);  // and so on

This code uses repository API from FluentNHibernate. I like this API because we only have one single general-purpose Repository. It decouples the repository completely from specific UI design. However I’m not comfortably about leaking naked Linq outside of repository. Exposing Linq to other layer will scatter the database concern all across application. Let’s consider what happens if we decides to refactor IsTerminated property that is currently implemented as a column in DB into C# code, say:

public bool IsTerminated {get {return this.TerminationDate != null; }};

The earlier Linq statement (possibly scattered all over the place) will start to fail since Linq is unable to map IsTerminated property into a correct SQL where clause.

3. Pipe and Filter Pattern

IQueryable<Customer> result = repository.All<Customer>().WithEmail(emailAddress).AgeAbove(ageFrom).AgeBelow(ageTo);
if(isTerminated != null)
	result = (isTerminated)? result.IsTerminated(): result.IsNotTerminated();	

Or in this case, that should be wrapped into:

IList<Customer> result = repository.All<Customer>().WithBlaBloodyBla(emailAddress, name, ageFrom, ageTo, isTerminated).List();

This approach leverages fluent IQueryable and extenssion-methods. It still exposes IQueryable which leaks database concern outside repository, but it’s much better since the query is properly encapsulated behind easy-to-read and maintainable extenssion methods.
In the above example with isTerminated check, it’s obvious that this approach is doing pretty well in handling dynamic query that is very difficult to express using previous lambda-expression approach. But the flexibility is pretty limited, or to be specific, you can only chain multiple filters in an AND relationship.
Another problem of this approach, which is actually the main reason I steer away from approaches #2 #3 #4, is on unit-testing. Yes it is very easy to unit-test each of the filter independently, but it is extremely difficult to mock out those filters to unit-test services that depends on it. I’ll describe the problem in the next approach.

4. Specification pattern

IList<Customer> result = repository.FindAll<Customer>(new WithinAgeCustomerSpecification(ageFrom, ageTo));

My prefered solution is largely derived from specification pattern, I’ll give extra highlight to this approach later. Long story short, IMO this approach is best since it doesn’t leak any data-concern and linq to outside repository. It also separates the responsiblities between loading/saving domain entity (repository) and querying (specification). I’ll start with the problem.
As mentioned, it’s very easy to unit-test each of the specification using the infamous in-memory/sqlite repository testing. But it’s incredibly difficult to unit-test the UI controller and application layer that uses the specification.
Just to give a concrete illustration, this is how I write unit-test had I used the approach #1. (Simplified to search age only)

customerRepository.Expect(x => x.SearchByAgeBetween(20, 30)).Return(stubCustomers);
ViewResult view = customerSearchController.Search(20, 30); 
Assert.That(view.Model, Is.EqualTo(stubCustomers);

But anyone has suggestion how I could test the following controller (simplified)?

public class CustomerController: Controller
{
	IRepository repository; //Injected
	
	public ActionResult Search(int? ageFrom, int? ageTo)
	{
		var customers = repository.Query(new WithinAgeCustomerSpecification(ageFrom, ageTo));
		return View("search", customers);
	}
}

That little call to “new WithinAgeCustomerSpecification(..)” makes it virtually impossible to mock the specification and take it out from the test concern. Linq and Extension method in approach #2 and #3 certainly don’t help.
Why do we care to mock the specification? Because, mind you again, testing queries _IS_ painful! It’s tedious to setup stub-data and verify query result. Each of the specification has had this kind of unit-test themselves, and we certainly _DO_NOT_ want to repeat the test in the controller. For sake of discussion, this is how the unit-test for the controller would look like using unmocked Specification.

ShouldLoadSearchView();
CanLoadCustomerByEmailAddressToViewData();
CanLoadCustomerByNameToViewData();
CanLoadCustomerByAgeToViewData();
CanLoadCustomerByNameAndEmailAddressToViewData();
// etc etc

Each test-case deals with tedious in-memory/sqlite stub data. I don’t even understand why I need to care about data and Sqlite to unit-test UI/Application layer. It just doesn’t make sense.
And guess how the unit-test for the specification looks like.

CanSearchByEmailAddress();
CanSearchByName();
CanSearchByAge();
CanSearchByNameAndEmailAddress();
// etc etc

That’s right, duplication. Not to mention tediously data-driven. Generally, you want to avoid testing that involves data and query. For comparisson, this is how the test for the controller would look like with mocked specification.

ShouldLoadSearchView();
ShouldSearchCustomerUsingCorrectSpecification();
ShouldLoadSearchResultToViewData();

Yes, that’s all we care: “controller should use correct specification to search the customer”. We don’t care if the specification actually does what it claims it does. That’s for other developers to care.

Solution

By wrapping Specifications into a factory, we decouple the controller from Specification implementation.

public class CustomerController: Controller
{
	IRepository repository; //Injected
	ICustomerSpecFactory whereCustomer; //Injected
	
	public ActionResult Search(int? ageFrom, int? ageTo)
	{
		var customers = repository.Query(whereCustomer.HasAgeBetween(ageFrom, ageTo));
		return View("search", customers);
	}
}

Unit-test is a breeze.

whereCustomer.Expect(x => x.HasAgeBetween(20, 30))
	.Return(stubSpec = MockRepository.Generate<Specification<Customer>>);
customerRepository.Expect(x => x.Query(stubSpec)).Return(stubCustomers);

var view = customerController.Search(20, 30); // EXECUTE CONTROLLER ACTION
Assert.That(view.Model, Is.EqualTo(stubCustomer);

EDIT: I posted a better way to write unit-test for this

I actually like it a lot! The specification is also amazingly flexible to mix and play. E.g.:

repository.Query((whereCustomer.HasAgeBetween(20, 30) || whereCustomer.LivesIn("Berlin")) && !whereCustomer.IsVIP());

And they’re still testable and mock-friendly. Now that we know I like this approach, let’s take a look on the implementation of specificatin pattern.

public class CustomerQuery: ICustomerQuery
{
	public ISpecification<Customer> MathesUserSearchFilters(string email, string name, int? ageFrom, int? ageTo, bool isTerminated)
	{
		var result = Specification<Customer>.TRUE;
		if(email != null)
			result &= new Specification<Customer>(x => x.email.ToLower() == email.ToLower());
		if(name != null)
			result &= new Specification<Customer>(x => x.Name.ToLower().Contains(name.ToLower());
		if(ageFrom != null)
			result &= IsOlderThan(ageFrom.Value);
		if(ageTo != null)
			result &= !IsOlderThan(ageTo.Value);
		if(isTerminated != null)
			result &= (isTerminated)?IsTerminated():!IsTerminated();
		return result;
	}
	public ISpecification<Customer> IsOlderThan(int yearOld) {/*..*/}
	public ISpecification<Customer> IsTerminated() {/*..*/}
}

Unlike Linq criteria, Specification plays incredibly well with building dynamic query! And that’s not the best part yet. These Specifications are not mere DB queries. Write once, use it everywhere. It can be used for object-filtering or validation.

var terminatedCustomers = customerList.FindAll(whereCustomer.IsTerminated()); 

Or:

Validate(customer, !whereCustomer.IsTerminated())
	.Message("The customer had been terminated. Please enter an active customer");

Code for sample Specification API can be found in Ritesh Rao’s post.

Where are we?

Oh yes, our initial objective: plugging in new screen/functionality in Open-Closed Principle fashion. Not a problem. The query can live in separate assembly, and it’s easy for the client to introduce their own set of Specifications that meets their querying needs for their plugins.
This approach also gives us the liberty to override the specification implementation, e.g. to comply with specific persistence technology, or database-structure. Say, if Customer.Name is implemented as FIRST_NAME and LAST_NAME columns. Overriding Specification implementation is not possible with “new” keyword or extension method approach, since the application is tightly coupled to specific Specification implementation.
This allows clients to extend the domain entity with their business-specific properties and persistence-structure.

About these ads

8 Comments »

  1. Hello.
    How specification approach resolve this problem
    >Exposing Linq to other layer will scatter the database concern all across application.
    ?
    repository.Query – will work only if i give him special database-orientated specification.
    The repository interface is not straightforward.

    Comment by Nikita Govorov — March 25, 2009 @ 10:19 am | Reply

    • Yes, specification implementation is DB-oriented, but I think you miss a point: specification _IS_ your data-layer. It is its responsiblity to provide query expressions that actually work with the database infrastructure. They’re also responsible to provide optimisation.

      In the other hand, exposing Linq out of data-access layer is a leak. Writing Linq expressions that can *work* with actual relational DB is difficult, let alone optimised. So you probably don’t want to get your business codes write Linq expressions any more than you want them to write HQL/SQL statements. It’s just too vulnerable to break and, in the case of linq, full of surprise. Ideally any changes to database should affect nothing else but data layer.

      It’s also the reason I use Specification factory. Application code only knows about Specification (factory) interfaces, and nothing about its implementations. Thus you can easily reconfigure IoC container to switch from NHibernate specification implementation to Linq2Sql, for example, although this is not the main intention. Just mere illustration of “unscattered” data-concern.

      Comment by hendryluk — March 25, 2009 @ 11:34 am | Reply

  2. In my case IRepository is part of the domain model layer(what about you?). And Repository(implementation of IRepository) is a data layer. Since I have exposed IRepository.Query, someone from domain logic can implement ISpecification and call IRepository.Query which will fails. Maybe your proposal is suitable if IRepository part of the application layer, but I’m not sure that they are there.

    Comment by Nikita Govorov — March 25, 2009 @ 8:36 pm | Reply

  3. I also have IRepository and ISpecification as part of domain model layer. Both Repository and Specification implementations are in data layer. No, I do not have have any data Specification implementation in domain layer or application layer.

    Consequently, Specification factory interface (ICustomerQuery) lives in domain model layer, but its implementation (WhereCustomer) in data-layer.

    Comment by hendryluk — March 25, 2009 @ 10:29 pm | Reply

  4. This is very up-to-date info. I’ll share it on Delicious.

    Comment by Vince Delmonte — April 15, 2009 @ 10:26 am | Reply

  5. [...] Filed under: Software Development — Hendry Luk @ 6:34 am I’m using Specification pattern for my data-access query, and while it is testable, they were anything but [...]

    Pingback by Don’t Limit Yourself to Mocking Frameworks « Hendry Luk — Sheep in Fence — April 20, 2009 @ 6:34 am | Reply

  6. >Exposing Linq to other layer will scatter the database concern all across application.

    How about having layer “Service” in between Repository and the Application?
    At least you won’t see linq query scattered across the application layer.

    Using your example:

    interface ICustomerRepository
    {
    IQueryable GetAll();
    }

    interface ICustomerService
    {
    IList GetCustomerByAge(int age);
    }

    public IList GetCustomerByAge(int age)
    {
    return repository.GetAll().Where(x=> x.Age <= age).ToList();
    }

    What’s your take regarding this approach?

    Comment by Hery — May 9, 2009 @ 9:13 am | Reply

    • >Exposing Linq to other layer will scatter the database concern all across application
      How about having layer “Service” in between Repository and the Application?

      Service is still part of ‘application’ I was refering to. And you actually just pointed out the first problem:

      return repository.GetAll().Where(x=> x.Age <= age).ToList();

      That wouldn’t work because x.Age can’t be translated into SQL using our HBM configuration. (DateOfBirth should be used instead). This is something that’s very closely dependent to how we configure our persistence infrastructure: a change to hbm configuration will be likely to break those kind of Linq expressions, and when that happens, it’d better be in data-access layer than anywhere else in the application.

      Second problem: that doesn’t really give us any Linq advantage over using IRepository. We still end up with methods like:
      GetCustomerByAgeRangeAndEmailAddressAndFullNameContainingAndIsTerminated(…), something I wanted to get rid of in the first place.

      Comment by Hendry Luk — May 9, 2009 @ 9:53 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: