NHibernate: Linq-ing Calculated Properties

Some properties are mapped onto database columns, while others are calculated. Here’s one simple example:

public class Parcel
{
   public PackagingType PackagingType {get; set;}
   public ParcelType ParcelType  {get; set; }
   public float Weight {get; set;}
   public decimal Fee
   {
      get {return (Weight - PackagingType.TareWeight) * ParcelType.FeePerWeight;
   }
}

“Fee” is a calculated property. You cannot use this property as a part of your Linq query. For instance, if you want to query Parcels with Fees above certain values:

var expensiveParcels = from parcel in Session.Query<Parcel> where parcel.Fee > 1000 select parcel;

Obviously that wouldn’t work because NHibernate does not recognize this “Fee” property.

There are 3 (conventional) ways to resolve this, you can choose either one of these:

  1. Change all your queries to avoid “Fee” property, and repeat the calculation logic (every time) instead.
    var expensiveParcels = from parcel in Session.Query<Parcel>
          where (parcel.Weight - parcel.PackagingType.TareWeight) * parcel.ParcelType.FeePerWeight > 1000
          select parcel;
    
  2. A slightly better way is to map “Fee” property with SQL Formula. It’s better because you do not have to repeat Fee calculation on every Linq query. E.g., in FluentNHibernate:
    Map(x=> x.Fee).Access.Readonly().Formula( // Raw SQL ->
        @"((select Weight - (select TareWeight from PackagingType p where  p.Id = PackagingId))
              * (select FeePerWeight from ParcelType pt where pt.Id = ParcelTypeId))");
    );
    
  3. Alternatively, you could also write an ILinqToHqlGenerator implementation.
    Since version 3.0, NHibernate has been utilizing ReLinq for its Linq-provider, which greatly improves its extensibility. It allows you to register your own ILinqToHqlGenerator, where you basically take a specific Linq expression and return whatever HqlTree expression you desire. For instance, in this example, we build our Hql expression like so:

    public class ParcelFeeLinqGenerator: BaseHqlGeneratorForProperty
    {
       public override HqlTreeNode BuildHql(MemberInfo member, Expression expression, HqlTreeBuilder treeBuilder, IHqlExpressionVisitor visitor)
       {
          var parcel = visitor.Visit(expression).AsExpression();
          return treeBuilder.Multiply(
              treeBuilder.Subtract(
                 treeBuilder.Dot(parcel, treeBuilder.Ident("Weight")),
                 treeBuilder.Dot(treeBuilder.Dot(parcel, treeBuilder.Ident("PackagingType")), treeBuilder.Ident("TareWeight"))
              ),
              treeBuilder.Dot(treeBuilder.Dot(parcel, treeBuilder.Ident("ParcelType")), treeBuilder.Ident("FeePerWeight"))
          );
       }
    }
    

    And you register this generator into NHibernate runtime to handle our Fee property:

    registry.RegisterGenerator(ReflectionHelper.GetProperty<Parcel, decimal>(x=> x.Fee), new ParcelFeeLinqGenerator());
    

All those 3 approaches do the job just fine, but whichever one you pick you’ll still end up duplicating your Fee calculation logic one way or another. You’ll either be repeating your logic in C# expressions (option#1), in SQL expression (option #2), or in HQL expression (option #3). That seems to violate DRY. It’s easy to forget to change your querying logic whenever your pricing rule changes (e.g. tax).

Better Way?

The ideal solution is to eliminate logic duplication. We want to be able to write the calculation logic only once to be shared both by the property as well as by Linq queries. I’ll be using approach#3 while employing a pattern to avoid duplicating our logic.

First step, I create this generic ILinqToHqlGenerator plumbing that I can reuse for all calculated properties in my projects.

public class CalculatedPropertyGenerator<T, TResult> : BaseHqlGeneratorForProperty
{
   public static void Register(ILinqToHqlGeneratorsRegistry registry, Expression<Func<T, TResult>> property, Expression<Func<T, TResult>> calculationExp)
   {
      registry.RegisterGenerator(ReflectionHelper.GetProperty(property), new CalculatedPropertyGenerator<T, TResult>{_calculationExp = calculationExp});
   }
   private CalculatedPropertyGenerator(){} // Private constructor

   private readonly Expression<Func<T, TResult>> _calculationExp;
   public override HqlTreeNode BuildHql(MemberInfo member, Expression expression, HqlTreeBuilder treeBuilder, IHqlExpressionVisitor visitor)
   {
      return visitor.Visit(_calculationExp);
   }
}

Once we got that plumbing class in place, we’ll then modify my parcel class slightly to look like the following:

public class Parcel
{
   public PackagingType PackagingType {get; set;}
   public ParcelType ParcelType  {get; set; }
   public float Weight {get; set;}

   /// <summary>
   /// To be reused for (NHibernate) Linq generator
   /// </summary>
   public static readonly Expression<Func<Parcel, decimal>> CalculateFeeExpression = x =>
          (x.Weight - x.PackagingType.TareWeight) / x.ParcelType.FeePerWeight;

   private static readonly Func<Parcel, decimal> CalculateFee = CalculateFeeExpression.Compile();
   public decimal Fee
   {
      get {return CalculateFee(this);
   }
}

Now you register this FeeCalculationExpression to NHibernate registry so that NHibernate can now translate Fee property using the same fee-calculation expression used by the Parcel class itself.

CalculatedPropertyGenerator<Parcel, double>.Register(registry, x=> x.Fee, Parcel.CalculateFeeExpression);

Now this Linq query works. NHibernate knows how to handle Fee property in the query.

var expensiveParcels = from parcel in Session.Query<Parcel> where parcel.Fee > 1000 select parcel;

We are reusing the same business-rule for calculating Fee property as well as to generate our NHibernate queries. There’s no duplication, and it requires very little setup as well (once you get the CalculatedPropertyGenerator plumbing class in place). This so far has been my favorite approach to map calculated-properties to NHibernate.

Advertisements

Software Development Fundamentals, Part 3: Object Relational Mapping

This is my first blog post since I mysteriously disappeared from blogosphere for more than a full year, and as much as it deserves an explanation, disappointingly the fact is much duller than the conspiracy theory of alien abduction. Desperate for an excuse, the word “busy” immediately springs to mind. I’ll leave it at that.

Anyway, where were we? Oh right, in my last post, I promised to write about O/RM in the context of software architecture. The promise was dated a year ago, and a lot of things have changed since then in the way we build applications. Many lessons have been learnt in our industry, most notably NoSql and CQRS architecture, which make any talk about O/RM today seem embarrassingly archaic. But I’m the man of my word, my pledge is my bond, so here we are. After all, despite all the new exotic NoSql and CQRS, ORM is still by large the mainstream of today’s way of developing application. Mainstream is everything this series is.

This post is not going to talk about how to use ORM products, their performance characteristics, or, god forbids, a reenactment of the classical war of ORM vs Stored Procedure. If you’re looking for introduction to NHibernate and its feature, sorry this post might not be for you. Instead, I’m here to talk about the impact ORM makes to application architecture, and why it’s absolutely mandatory for building domain-driven architecture.

(n)Hibernate

A natural way to start this post would be by asking the question: “What is ORM?”. But at second thought, it’s easier to define ORM by what it does, rather than what it is. So let’s just skip this question, and take Hibernate/nHibernate for our purpose, being the most popular ORMs of our time.

NHibernate is not a data-access framework as much as it is an Object Relational Mapping. ORM is based on the premise that we are writing our application using an object-oriented language, and that we want to keep it that way. Hibernate hides away the presence of relational database, so you can focus on modeling your business object. The persistence plumbing will simply be taken care of for you.

There Is No Database

There was a time where people used a term like “database aplication”. During that time, we typically had a database sitting at the core of the system, and the focus of our applications was to execute a bunch of SQL statements to transform “business-data”, and run some SQL queries to bring back the data to be presented on application UI.
We have moved past this kind of architectural mindset a long time ago, and began to realize that all we need is really just a place to keep objects. It shouldn’t be anymore complicated than that. We started asking question: why can’t we just store our objects in a big object Collection? (As in standard Java/.net Collection). Why can’t we just say:

  • Inserting a new customer: customers.Add(customer)
  • Fethcing it back: customer = customers.Where(c=> c.FirstName == “Piggy”).First()
  • Delete it: customers.Remove(customer)

It should be that simple: no SQL statement, no tables, no joins. Just a standard .net object collection. Now where that collection chooses to keep all those objects, that we simply don’t care. For all we know, it may be in the memory, or in the web session, or serialize them to file-system, or it may somehow store it to database tables… we don’t care. We only know we have a Collection of Customer objects, and as far as we’re concerned, there is no database.

Abstracting database to resemble object collection is incredibly hard. But it’s a solved problem. One of the most popular ORM products available today is Hiberate/nHibernate. Here’s an example of using nHibernate to update customer information and add a new account into it:

var customer = session.Linq<Customer>.Where(c=> c.CustomerNumber == custNo).List();
customer.CreditLimit += 500;
customer.AddAccount(new Account(accNo));

The first line above will invoke the following SQL query:

select * from FOH_CUSTOMERS c where c.CUSTOMER_NUMBER = ?

You also notice that there’s nothing in the code above to update the customer table, or to insert to the account table. It’s very much like accessing a normal object Collection: you pull an object from the collection, you mutate the object’s state, and that’s it: the object is already changed. The next time you grab the same customer, you’ll see the changed credit-limit, and the new account under the customer. You didn’t have to invoke update or anything, Hibernate will figure out the way to synchronize the data in your DB tables to reflect these changes.

NHibernate follows the “unit-of-work” pattern (which they call “Session“). Session is the nHibernate equivalent for object Collection. It represent a single isolated set of objects, meaning that if pull an object from a session and change the object’s state, the changes will be immediately visible to anyone using the same session. NHibernate keeps track of all these changes, and when we flush the session, it will do all the yoga to make sure all these changes are saved back to the central database, and therefore become visible to all other sessions. But our code does not have to be aware of any of that.

Therefore, in the example above, when you flush the session the following SQL statements will be executed:

update CRM_CUSTOMERS set CREDIT_LIMIT = ? where CUST_ID = ?;
insert into CRM_ACCOUTNS (ACCOUNT_ID, CUST_ID, ACCOUNT_NUMBER) values (?, ?, ?);

That was all what O/RM is, really. There are plenty other different features that various ORM products offer (lazy-load, futures, caching, associations, projection, batching, etc), but at the very core of all, the ultimate goal of every O/RM products is just that: the ability to “pretend” like a normal object Collection and to pretend like there is no database. At least that’s the goal, which is easier said than done. That feature might not sound such a big deal, but this is nonetheless the very essence of ORM, the one thing without which Domain-Driven-Design code would not be possible (on RDBMS).

O/RM And Domain Driven Design

I’m hoping to expand on DDD topic in later posts, but in a nutshell, our objects are not data bags. Your Customer object, a domain entity, might indeed hold some properties: name, email-address, accounts, credit-limits, etc, but more importantly, it represents the concept of Customer in your domain, and what really matters about it is its behaviors. It is domain behaviors that you’re trying to model as your domain objects. You don’t change the Address property of a Customer, instead the customer might move home. An Employee does not change its Salary property, instead the Employee might get a pay-rise, or maybe a promotion. We don’t just change the Balance of your bank account; we either Deposit(), Withdraw(), or Transfer() some money into bank accounts.

It is, however, extremely hard to write domain-driven code without ORM. I have worked in several projects without ORM on top of RDBMS, and I have to say that there can possibly be only one possible outcome: the objects are designed as normalised bags of properties whose role is merely to hold data to be loaded and saved into database tables. This anti-pattern popularly goes by the name of Anemic-Domain-Model.
The reason ORM takes an exclusive post in this blog series is exactly because I believe ORM is an absolute prerequisite of implementing DDD on top of relational persistence.

Let me present you with the ubiquitous sales-cart example. We’re an online book-store who sells books and magazine subscription. Let’s write a method to add a product into customer’s sales cart. Note that when we add a magazine subscription to the sales cart (which will charge weekly/monthly), the customer will have to re-read and re-agree on the Terms and Conditions of the sale again.

Here’s one implementation, without ORM:

public void AddToSalesCart(Customer customer, string productCode, int quantity)
{
   var salesCart = SalesCartDao.GetCartByCustomerId(customer.Id);
   var product = ProductDao.GetByCode(productCode);
   var cartItem = CartItemDAO.GetBySalesCartIdAndProductCode(salesCart.Id, productCode);

   if(cartItem != null)
   {
      cartItem.Quantity += quantity;
      CartItemDAO.Update(cartItem); // <- persistence
      }
   else
   {
      cartItem = new CartItem
      {
         salesCartId = salesCart.Id,
         ProductId = product.Id;
         Quantity = quantity;
      }
      CartItemDAO.Insert(cartItem); // <- persistence
   }

   if(product.IsSubscription && salesCart.IsTermsConditionAgreed)
   {
       salesCart.IsTermsConditionAgreed = false;
      SalesCartDao.Update(salesCart); // <- persistence
   }
}

The code above is written as a sequence of procedural instructions that puts values on dumb data-bags, which will in turn be used to generate update/insert SQL statements. This kind of code is known as transaction-script (anti-pattern). Unfortunately you cannot encapsulate this logic into domain methods (e.g. salesCart.AddProduct(product, quantity)), because then we don’t have a way to keep track of the changes that the method may make to object states (what table to update/insert). I.e., we have no way to synchronise the state changes back to the database. For this reason, all the objects need to stay as dumb as possible, only containing properties to hold data, and have no method.

ORM changes the game. It allows you to add behaviors to your domain-models because you no longer have to worry about keeping track of state changes. So the code above can be implemented into domain-driven code as such:

public void AddToSalesCart(Customer customer, string productCode, int quantity)
{
   var salesCart = salesCartRepository.Where(s=> s.Customer == customer).First();
   var product = productRepository.Where(p=> p.Code == productCode).First();

   salesCart.AddProduct(product, quantity);
}

public class SalesCart
{
   public void AddProduct(product, quantity)
   {
      var cartItem = items.Where(i=> i.Product == product).First();
      if(cartItem == null)
      {
         cartItem = new CartItem { Product = product};
         items.Add(cartItem);
      }
      cartItem.Quantity += quantity;

      if(product.IsSubscription)
         IsTermConditionAgreed = false;
   }
}

PS: I’ll discuss about “repository” further down, but for now, let’s just say I just renamed DAO into Repository.

In web applications, the code above would probably use session-per-request pattern. Here’s the brief flow of the pattern:

  1. When we receive a web-request from the client, the plumbing will start an NHibernateSession. This session will start to track all changes within this web-context.
  2. We pulls from repositories all domain entities that are required to perform the user request. (Line 3-4)
  3. We invoke domain methods on the entities. (Line 13-22)
  4. When we finish processing the request, and send the response back to the client, the plumbing will flush the session, thus saving all changes made in step #2 and #3 to the database.

This way, all the persistence tracking is done by the plumbing. Freed from traking and saving state changes, we are now able to implement our AddProduct logic as a domain method within SalesCart entity, which as you can see, contains no reference to any persistence concern (update/insert). The virtue of POCO/persitence-ignorance.
The application should not access any property under SalesCart directly. Everything has to go via domain methods of SalesCart, because it’s the Aggregate-Root, which we’ll discuss shorty.

Also notice another subtle thing. In the previous code, we reference entities by IDs (e.g. CustomerId, ProductId, SalesCartId), which demonstrates a very relational mindset. The reason it’s done that way is because referencing entities by objects would be an inefficient way from persistence viewpoint. You would have to load the whole entity even when ID would suffice. In the refactored code, objects associations are modeled in a natural way that reflects both domain-driven-design as well as just basic OOP we learned in school. ORM promotes this without compromising performance thanks to lazy-loading. I.e., the 2 following lines are almost exactly equivalent:

salesCart.CustomerId = customerId;
salesCart.Customer = session.Load<Customer>(customerId);

The second line does not make any database-call. It will only return a proxy with that customerId. The nice thing is, unlike customerId, the proxy object still acts as the actual Customer object: it will load from the database the first time we need it, e.g. when accessing salesCart.Customer.FirstName. This is yet another trick ORM pulls to pretend that “there is no database” without hurting performance.

Aggregate Root

SalesCart is an Aggregate-Root, another DDD concept. In essence, an Aggregate Root is an entity that consumers refer to directly, representing a consistency boundary. Aggregate-roots are the only kind of entity to which your application may hold a reference. Each aggregate-root composes and guards its sub-entities, and is persisted as a single unit. This helps avoid the mess (like in previous approach) because you now have a constraint that prevents you from creating a tight coupling to each individual sub-entities.
In our example, SalesCart is an aggregate-root. CartItem is not; it’s merely part of SalesCart aggregate. SalesCart is our single entry-point to the aggregate (e.g. to add a product). You can’t access CartItem directly outside the aggregate boundary, similary you don’t have repository or DAO for CartItem. It’s persisted as part of SalesCart (cascade update/insert/delete). Aggregate concept is a key rule that simplifies domain persistence greatly.

Infrastructure Is Skin Deep

After our previous posts, I hope by now we have agreed on one thing: infrastructure should sit at the outer skin of the application. Our infrastructure concern, in this case, is where and how we persist our domain objects. Before ORM, during the time when building an application was an act of writing some codes to execute a series of SQL statements to JDBC/ADO.NET, it was not possible to pull out database concerns away from our application code without making an unacceptable amount of degradation in performance.

ORM lets you to do exactly that. It hides away the database plumbing so it is not visible from the surface. It replaces the notion of database with something that looks and smells like a normal object Collection. In DDD, this collection of domain objects is known as “Repository”.

Repository

It’s a common mistake to take the term repository as another name for DAO. They might look similar but they are different in principle. If anything, Repository is another name for ICollection, and rightly so. Repeat 3 times, Repository == ICollection: a component that holds references to your objects, allowing you to Get/Find the objects back, and it keeps track of the changes and lifecycles of the objects. Just like an ICollection, it may have various implementations, like ArrayList, Dictionary, perhaps HttpSession, serialized files, or in our case, a relational database. These implementations are insignificant: they sit right at the outer skin of the application, and they are therefore swappable.

Just to remind you with the diagram from the previous post a year back:

IRepository<T> interface sits comfortably in Sheep.Domain.Services at the POCO core of the system. Using ORM, our repository is able to pretend to be a POCO collection. In Sheep.Infrastructure.Data, we have an implementation of the repository (NHibernateRepository<T>) that uses NHibernate to manage the persistence to a relational database. At runtime, this implementation will be injected to the core by IoC container. Note that Sheep.Infrastructure.Data is the only namespace with any reference to System.Data and NHibernate. Outside this namespace, IRepository pretends to be a POCO object collection.

Testability

ORM frameworks abstract your database plumbing into a unified abstraction such like standard object collection. Having this abstraction means that your code is not dependent to specific persistence mechanism. Linq is another language-level abstraction available to .net developers, which nHibernate also leverages. This combination, not only provides ability to substitute your persistence easily between in-memory, web-services, distributed-hash-table, and relational-database, but it also means you can replace the actual mechanism with much simpler fake implementation for testing purpose.

We’ll use the code from our previous code as an example:

public class AuthenticationService: IAuthenticationService
{
    IRepository<User> userRepository;
    ICommunicationService communicationService;

    public UserAuthenticationService(IRepository<User> userRepository, ICommunicationService communicationService)
    {
        this.userRepository = userRepository;
        this.communicationService = communicationService;
    }

    public void SendForgottenPassword(string username)
    {
        User user = userRepository.Where(u=> u.Username == username).First();
        user.ResetPinWith(PinGenerator.GenerateRandomPin());
        if(user != null)
            communicationService.SendEmail(user.EmailAddress, String.Format("Your new PIN is {0}", user.Pin));
    }
}

Here is the unit-test code:

// Arrange
ICommunicationService mockCommunicationService = MockRepository.CreateMock<ICommunicationService>();
IRepository<User> userRepository = new ArrayListRepository<User>();
var user = new User{UserName = “Sheep”, EmailAddress=”test@test.com”, Pin = “123123”};
userRepository.Add(user);

// Action
var authenticationService = new AuthenticationService(userRepository, mockCommunicationService);
authenticationService.SendForgottenPassword(“Sheep”);

// Assert
Assert.That(user.Pin, Is.Not.EqualTo(“123123”), “Password should be reset”);
mockCommunicationService.AssertCalled(x=> x.SendEmail(”test@test.com”, String.Format(“Your new PIN is {0}”, user.Pin));

As you can see, AuthenticationService in our unit-test above uses a simple implementation of IRepository (ArrayListRepository) via dependency-injection. Like ArrayList, this ArrayListRepository simply holds its objects in a variable-size memory list, and is not backed by any persistent database. During runtime, however, AuthenticationService will be using a repository that is backed by database-engine via ORM (e.g. NHibernateRepository). This is normally done by configuring your IoC container, but if it’s to be written by plain code, it would look like:

var authenticationService = new AuthenticationService(new NHibernateRepository<User>(nhSession), emailService);
authenticationService.SendForgottenPassword(“Sheep”);

This way, NHibernateRepository can sit right at the edge of the application. Our application code (AuthenticationService, in this case) does not have to be aware of the relational database, or the ORM. In term of .net speak, your domain projects should not have any reference to System.Data or NHibernate assemblies. Only your outmost layer (Infrastructure and IoC projects) should know about these assemblies.

Model First Development

That all POCO talk in ORM endorses a development flow that starts with you modeling your domain entities as POCO objects, focusing on shaping their behaviours and relationships, without touching any persistence concerns. This development style facilitates Agile, since we no longer need to use any database during early stage of development, which is by plugging Hibernate to an in-memory database (e.g. Sqlite, HSql, SqlCE), so we can focus on evolving our object models and behaviors without getting the friction from database schema in the way. We just go ahead and think in objects and behaviors; we don’t need to think about tables, foreign-keys, joins, CRUD, normalization, etc. Fluent NHibernate greatly smoothens out this methodology

Only at the later stage when we’re already happy with our business code that we would start looking at database infrastructure details, which is actually as simple as plugging our ORM to a real database (Oracle, SqlServer, MySql, etc). NHibernate will do the rest, including generating all the SQL to create all the table schemas for us.
In Agile, it’s highly imperative to delay infrastructure concerns for as long as possible to stay adaptive to changes.

Convention over Configuration

Common resistance of adopting ORM in projects is the sheer complexity of its configuration, particularly in Java side (Hibernate). It’s probably fine in Java: application development in Java is full of XML from nose to toes, verbose, and no one seems to mind about that. But that does not fit very well to .Net development culture where codes are concise, boilerplate codes are plague, XMLs are grotesque beasts, and developers aren’t particularly keen of spending whole afternoon on XML configs before even seeing any running application.

That drove the need for a tool like Fluent NHibernate that allows you to “configure” your ORM using convention. I’m not going to bore you with the detail, but in general, it almost frees you completely from having to configure anything for your ORM to just work and save your objects to your database like magic. It lends itself to model-first-development with zero-friction. You simply go ahead and write your domain objects, then NHibernate will figure out the rest. You can immediately save your objects to the database (and query it), without even touching any mapping configuration from your side. It all sounds too magical. Fluent NHibernate allows this by inspecting your entities and properties, and automatically uses conventions to automatically generate table schemas, column definitions, associations, and constraints for you (e.g. if you choose pluralization, a Person class will be automatically mapped into a generated table named CRM_PEOPLE). You can always then stray away from this convention in a case-per-case basis when necessary by overriding the configuration using its fluent API for your specific entities.

The same capability is also available in Entity Framework since .Net 4.0.

Finally

I have to admit I overhyped many things quite a bit there. Most ORMs suffer from what we call leaky abstraction. There is an impedance mismatch between relational database and object-oriented language. Abstracting relational database into ICollection look-alike is incredibly hard, which is why we need ORM in the first place. It needs a good understanding in how ORM works under the hood in order to use it effectively, and it is very hard. There are luckily some tools that can greatly assist you in doing so like NHibernate Profiler. It analyses the way you use NHibernate, and give you useful advices and warnings when it detects performance problems that might need to be looked at. The existence of such a dedicated commercial tool in a way only highlights how complex ORM framework like NHibernate is.

These complexities with ORM frameworks fueled CQRS architecture and NoSql movement. Now that I have finally got this ORM chat out of the way, hopefully I will get to write some post about these new topics, or rather, about what I have learnt from NoSql and CQRS buzz so far. And now about this series. Next posts will probably cover TDD and DDD, hopefully not after another year of disappearance.

Extensible Query with Specification Pattern

Writing finder methods for data repositories is problematic. The following is an example of pretty simplistic customer-search screen:
Email Address: [________]
Name: [_______]
Age: from [__] to [__] year old
Is terminated: [ ]Yes [ ]No
{Search}

Now how are we going to implement data query behind this search? I’ll go through several alternatives. (Jump to solution if you can’t care less). Let’s start from the simplest approach.

1. Repository Finder Methods

IList<Customer> result = customerRepository.SearchByBlaBlaBloodyBla(email, name, ageFrom, ageTo, isTerminated);

Let me give more context about what I’m working on. I’m developing an application that is intended to be an extensible vertical CRM framework. The client will provide their own new screens/functionalities by writing specific plugin implementation in their own separate assembly, without requiring modification on the framework (Open-Closed Principle).
The approach you just saw above tightly couples the repository API with specific UI design. Any change with the design of the search screen will require the repository API to be reworked. And should we create one repository method for each search screen? Furthermore, when a new search screen or report functionality is plugged into the system by adding new plugin, we need to somehow extend the data repository API to cover each of those specific screen scenarios. This is not an easily extensible architecture.
The upside, this approach is simple and very easy to mock for unit-test. When flexibility is not an issue, I would go for this approach.

2. Lambda-Expression Repository

IList<Customer> result = repository.FindAll<Customer>(
	x => x.EmailAddress == emailAddress && x.IsTerminated == isTerminated);  // and so on

This code uses repository API from FluentNHibernate. I like this API because we only have one single general-purpose Repository. It decouples the repository completely from specific UI design. However I’m not comfortably about leaking naked Linq outside of repository. Exposing Linq to other layer will scatter the database concern all across application. Let’s consider what happens if we decides to refactor IsTerminated property that is currently implemented as a column in DB into C# code, say:

public bool IsTerminated {get {return this.TerminationDate != null; }};

The earlier Linq statement (possibly scattered all over the place) will start to fail since Linq is unable to map IsTerminated property into a correct SQL where clause.

3. Pipe and Filter Pattern

IQueryable<Customer> result = repository.All<Customer>().WithEmail(emailAddress).AgeAbove(ageFrom).AgeBelow(ageTo);
if(isTerminated != null)
	result = (isTerminated)? result.IsTerminated(): result.IsNotTerminated();	

Or in this case, that should be wrapped into:

IList<Customer> result = repository.All<Customer>().WithBlaBloodyBla(emailAddress, name, ageFrom, ageTo, isTerminated).List();

This approach leverages fluent IQueryable and extenssion-methods. It still exposes IQueryable which leaks database concern outside repository, but it’s much better since the query is properly encapsulated behind easy-to-read and maintainable extenssion methods.
In the above example with isTerminated check, it’s obvious that this approach is doing pretty well in handling dynamic query that is very difficult to express using previous lambda-expression approach. But the flexibility is pretty limited, or to be specific, you can only chain multiple filters in an AND relationship.
Another problem of this approach, which is actually the main reason I steer away from approaches #2 #3 #4, is on unit-testing. Yes it is very easy to unit-test each of the filter independently, but it is extremely difficult to mock out those filters to unit-test services that depends on it. I’ll describe the problem in the next approach.

4. Specification pattern

IList<Customer> result = repository.FindAll<Customer>(new WithinAgeCustomerSpecification(ageFrom, ageTo));

My prefered solution is largely derived from specification pattern, I’ll give extra highlight to this approach later. Long story short, IMO this approach is best since it doesn’t leak any data-concern and linq to outside repository. It also separates the responsiblities between loading/saving domain entity (repository) and querying (specification). I’ll start with the problem.
As mentioned, it’s very easy to unit-test each of the specification using the infamous in-memory/sqlite repository testing. But it’s incredibly difficult to unit-test the UI controller and application layer that uses the specification.
Just to give a concrete illustration, this is how I write unit-test had I used the approach #1. (Simplified to search age only)

customerRepository.Expect(x => x.SearchByAgeBetween(20, 30)).Return(stubCustomers);
ViewResult view = customerSearchController.Search(20, 30); 
Assert.That(view.Model, Is.EqualTo(stubCustomers);

But anyone has suggestion how I could test the following controller (simplified)?

public class CustomerController: Controller
{
	IRepository repository; //Injected
	
	public ActionResult Search(int? ageFrom, int? ageTo)
	{
		var customers = repository.Query(new WithinAgeCustomerSpecification(ageFrom, ageTo));
		return View("search", customers);
	}
}

That little call to “new WithinAgeCustomerSpecification(..)” makes it virtually impossible to mock the specification and take it out from the test concern. Linq and Extension method in approach #2 and #3 certainly don’t help.
Why do we care to mock the specification? Because, mind you again, testing queries _IS_ painful! It’s tedious to setup stub-data and verify query result. Each of the specification has had this kind of unit-test themselves, and we certainly _DO_NOT_ want to repeat the test in the controller. For sake of discussion, this is how the unit-test for the controller would look like using unmocked Specification.

ShouldLoadSearchView();
CanLoadCustomerByEmailAddressToViewData();
CanLoadCustomerByNameToViewData();
CanLoadCustomerByAgeToViewData();
CanLoadCustomerByNameAndEmailAddressToViewData();
// etc etc

Each test-case deals with tedious in-memory/sqlite stub data. I don’t even understand why I need to care about data and Sqlite to unit-test UI/Application layer. It just doesn’t make sense.
And guess how the unit-test for the specification looks like.

CanSearchByEmailAddress();
CanSearchByName();
CanSearchByAge();
CanSearchByNameAndEmailAddress();
// etc etc

That’s right, duplication. Not to mention tediously data-driven. Generally, you want to avoid testing that involves data and query. For comparisson, this is how the test for the controller would look like with mocked specification.

ShouldLoadSearchView();
ShouldSearchCustomerUsingCorrectSpecification();
ShouldLoadSearchResultToViewData();

Yes, that’s all we care: “controller should use correct specification to search the customer”. We don’t care if the specification actually does what it claims it does. That’s for other developers to care.

Solution

By wrapping Specifications into a factory, we decouple the controller from Specification implementation.

public class CustomerController: Controller
{
	IRepository repository; //Injected
	ICustomerSpecFactory whereCustomer; //Injected
	
	public ActionResult Search(int? ageFrom, int? ageTo)
	{
		var customers = repository.Query(whereCustomer.HasAgeBetween(ageFrom, ageTo));
		return View("search", customers);
	}
}

Unit-test is a breeze.

whereCustomer.Expect(x => x.HasAgeBetween(20, 30))
	.Return(stubSpec = MockRepository.Generate<Specification<Customer>>);
customerRepository.Expect(x => x.Query(stubSpec)).Return(stubCustomers);

var view = customerController.Search(20, 30); // EXECUTE CONTROLLER ACTION
Assert.That(view.Model, Is.EqualTo(stubCustomer);

EDIT: I posted a better way to write unit-test for this

I actually like it a lot! The specification is also amazingly flexible to mix and play. E.g.:

repository.Query((whereCustomer.HasAgeBetween(20, 30) || whereCustomer.LivesIn("Berlin")) && !whereCustomer.IsVIP());

And they’re still testable and mock-friendly. Now that we know I like this approach, let’s take a look on the implementation of specificatin pattern.

public class CustomerQuery: ICustomerQuery
{
	public ISpecification<Customer> MathesUserSearchFilters(string email, string name, int? ageFrom, int? ageTo, bool isTerminated)
	{
		var result = Specification<Customer>.TRUE;
		if(email != null)
			result &= new Specification<Customer>(x => x.email.ToLower() == email.ToLower());
		if(name != null)
			result &= new Specification<Customer>(x => x.Name.ToLower().Contains(name.ToLower());
		if(ageFrom != null)
			result &= IsOlderThan(ageFrom.Value);
		if(ageTo != null)
			result &= !IsOlderThan(ageTo.Value);
		if(isTerminated != null)
			result &= (isTerminated)?IsTerminated():!IsTerminated();
		return result;
	}
	public ISpecification<Customer> IsOlderThan(int yearOld) {/*..*/}
	public ISpecification<Customer> IsTerminated() {/*..*/}
}

Unlike Linq criteria, Specification plays incredibly well with building dynamic query! And that’s not the best part yet. These Specifications are not mere DB queries. Write once, use it everywhere. It can be used for object-filtering or validation.

var terminatedCustomers = customerList.FindAll(whereCustomer.IsTerminated()); 

Or:

Validate(customer, !whereCustomer.IsTerminated())
	.Message("The customer had been terminated. Please enter an active customer");

Code for sample Specification API can be found in Ritesh Rao’s post.

Where are we?

Oh yes, our initial objective: plugging in new screen/functionality in Open-Closed Principle fashion. Not a problem. The query can live in separate assembly, and it’s easy for the client to introduce their own set of Specifications that meets their querying needs for their plugins.
This approach also gives us the liberty to override the specification implementation, e.g. to comply with specific persistence technology, or database-structure. Say, if Customer.Name is implemented as FIRST_NAME and LAST_NAME columns. Overriding Specification implementation is not possible with “new” keyword or extension method approach, since the application is tightly coupled to specific Specification implementation.
This allows clients to extend the domain entity with their business-specific properties and persistence-structure.