Software Development Fundamentals, Part 3: Object Relational Mapping

This is my first blog post since I mysteriously disappeared from blogosphere for more than a full year, and as much as it deserves an explanation, disappointingly the fact is much duller than the conspiracy theory of alien abduction. Desperate for an excuse, the word “busy” immediately springs to mind. I’ll leave it at that.

Anyway, where were we? Oh right, in my last post, I promised to write about O/RM in the context of software architecture. The promise was dated a year ago, and a lot of things have changed since then in the way we build applications. Many lessons have been learnt in our industry, most notably NoSql and CQRS architecture, which make any talk about O/RM today seem embarrassingly archaic. But I’m the man of my word, my pledge is my bond, so here we are. After all, despite all the new exotic NoSql and CQRS, ORM is still by large the mainstream of today’s way of developing application. Mainstream is everything this series is.

This post is not going to talk about how to use ORM products, their performance characteristics, or, god forbids, a reenactment of the classical war of ORM vs Stored Procedure. If you’re looking for introduction to NHibernate and its feature, sorry this post might not be for you. Instead, I’m here to talk about the impact ORM makes to application architecture, and why it’s absolutely mandatory for building domain-driven architecture.

(n)Hibernate

A natural way to start this post would be by asking the question: “What is ORM?”. But at second thought, it’s easier to define ORM by what it does, rather than what it is. So let’s just skip this question, and take Hibernate/nHibernate for our purpose, being the most popular ORMs of our time.

NHibernate is not a data-access framework as much as it is an Object Relational Mapping. ORM is based on the premise that we are writing our application using an object-oriented language, and that we want to keep it that way. Hibernate hides away the presence of relational database, so you can focus on modeling your business object. The persistence plumbing will simply be taken care of for you.

There Is No Database

There was a time where people used a term like “database aplication”. During that time, we typically had a database sitting at the core of the system, and the focus of our applications was to execute a bunch of SQL statements to transform “business-data”, and run some SQL queries to bring back the data to be presented on application UI.
We have moved past this kind of architectural mindset a long time ago, and began to realize that all we need is really just a place to keep objects. It shouldn’t be anymore complicated than that. We started asking question: why can’t we just store our objects in a big object Collection? (As in standard Java/.net Collection). Why can’t we just say:

  • Inserting a new customer: customers.Add(customer)
  • Fethcing it back: customer = customers.Where(c=> c.FirstName == “Piggy”).First()
  • Delete it: customers.Remove(customer)

It should be that simple: no SQL statement, no tables, no joins. Just a standard .net object collection. Now where that collection chooses to keep all those objects, that we simply don’t care. For all we know, it may be in the memory, or in the web session, or serialize them to file-system, or it may somehow store it to database tables… we don’t care. We only know we have a Collection of Customer objects, and as far as we’re concerned, there is no database.

Abstracting database to resemble object collection is incredibly hard. But it’s a solved problem. One of the most popular ORM products available today is Hiberate/nHibernate. Here’s an example of using nHibernate to update customer information and add a new account into it:

var customer = session.Linq<Customer>.Where(c=> c.CustomerNumber == custNo).List();
customer.CreditLimit += 500;
customer.AddAccount(new Account(accNo));

The first line above will invoke the following SQL query:

select * from FOH_CUSTOMERS c where c.CUSTOMER_NUMBER = ?

You also notice that there’s nothing in the code above to update the customer table, or to insert to the account table. It’s very much like accessing a normal object Collection: you pull an object from the collection, you mutate the object’s state, and that’s it: the object is already changed. The next time you grab the same customer, you’ll see the changed credit-limit, and the new account under the customer. You didn’t have to invoke update or anything, Hibernate will figure out the way to synchronize the data in your DB tables to reflect these changes.

NHibernate follows the “unit-of-work” pattern (which they call “Session“). Session is the nHibernate equivalent for object Collection. It represent a single isolated set of objects, meaning that if pull an object from a session and change the object’s state, the changes will be immediately visible to anyone using the same session. NHibernate keeps track of all these changes, and when we flush the session, it will do all the yoga to make sure all these changes are saved back to the central database, and therefore become visible to all other sessions. But our code does not have to be aware of any of that.

Therefore, in the example above, when you flush the session the following SQL statements will be executed:

update CRM_CUSTOMERS set CREDIT_LIMIT = ? where CUST_ID = ?;
insert into CRM_ACCOUTNS (ACCOUNT_ID, CUST_ID, ACCOUNT_NUMBER) values (?, ?, ?);

That was all what O/RM is, really. There are plenty other different features that various ORM products offer (lazy-load, futures, caching, associations, projection, batching, etc), but at the very core of all, the ultimate goal of every O/RM products is just that: the ability to “pretend” like a normal object Collection and to pretend like there is no database. At least that’s the goal, which is easier said than done. That feature might not sound such a big deal, but this is nonetheless the very essence of ORM, the one thing without which Domain-Driven-Design code would not be possible (on RDBMS).

O/RM And Domain Driven Design

I’m hoping to expand on DDD topic in later posts, but in a nutshell, our objects are not data bags. Your Customer object, a domain entity, might indeed hold some properties: name, email-address, accounts, credit-limits, etc, but more importantly, it represents the concept of Customer in your domain, and what really matters about it is its behaviors. It is domain behaviors that you’re trying to model as your domain objects. You don’t change the Address property of a Customer, instead the customer might move home. An Employee does not change its Salary property, instead the Employee might get a pay-rise, or maybe a promotion. We don’t just change the Balance of your bank account; we either Deposit(), Withdraw(), or Transfer() some money into bank accounts.

It is, however, extremely hard to write domain-driven code without ORM. I have worked in several projects without ORM on top of RDBMS, and I have to say that there can possibly be only one possible outcome: the objects are designed as normalised bags of properties whose role is merely to hold data to be loaded and saved into database tables. This anti-pattern popularly goes by the name of Anemic-Domain-Model.
The reason ORM takes an exclusive post in this blog series is exactly because I believe ORM is an absolute prerequisite of implementing DDD on top of relational persistence.

Let me present you with the ubiquitous sales-cart example. We’re an online book-store who sells books and magazine subscription. Let’s write a method to add a product into customer’s sales cart. Note that when we add a magazine subscription to the sales cart (which will charge weekly/monthly), the customer will have to re-read and re-agree on the Terms and Conditions of the sale again.

Here’s one implementation, without ORM:

public void AddToSalesCart(Customer customer, string productCode, int quantity)
{
   var salesCart = SalesCartDao.GetCartByCustomerId(customer.Id);
   var product = ProductDao.GetByCode(productCode);
   var cartItem = CartItemDAO.GetBySalesCartIdAndProductCode(salesCart.Id, productCode);

   if(cartItem != null)
   {
      cartItem.Quantity += quantity;
      CartItemDAO.Update(cartItem); // <- persistence
      }
   else
   {
      cartItem = new CartItem
      {
         salesCartId = salesCart.Id,
         ProductId = product.Id;
         Quantity = quantity;
      }
      CartItemDAO.Insert(cartItem); // <- persistence
   }

   if(product.IsSubscription && salesCart.IsTermsConditionAgreed)
   {
       salesCart.IsTermsConditionAgreed = false;
      SalesCartDao.Update(salesCart); // <- persistence
   }
}

The code above is written as a sequence of procedural instructions that puts values on dumb data-bags, which will in turn be used to generate update/insert SQL statements. This kind of code is known as transaction-script (anti-pattern). Unfortunately you cannot encapsulate this logic into domain methods (e.g. salesCart.AddProduct(product, quantity)), because then we don’t have a way to keep track of the changes that the method may make to object states (what table to update/insert). I.e., we have no way to synchronise the state changes back to the database. For this reason, all the objects need to stay as dumb as possible, only containing properties to hold data, and have no method.

ORM changes the game. It allows you to add behaviors to your domain-models because you no longer have to worry about keeping track of state changes. So the code above can be implemented into domain-driven code as such:

public void AddToSalesCart(Customer customer, string productCode, int quantity)
{
   var salesCart = salesCartRepository.Where(s=> s.Customer == customer).First();
   var product = productRepository.Where(p=> p.Code == productCode).First();

   salesCart.AddProduct(product, quantity);
}

public class SalesCart
{
   public void AddProduct(product, quantity)
   {
      var cartItem = items.Where(i=> i.Product == product).First();
      if(cartItem == null)
      {
         cartItem = new CartItem { Product = product};
         items.Add(cartItem);
      }
      cartItem.Quantity += quantity;

      if(product.IsSubscription)
         IsTermConditionAgreed = false;
   }
}

PS: I’ll discuss about “repository” further down, but for now, let’s just say I just renamed DAO into Repository.

In web applications, the code above would probably use session-per-request pattern. Here’s the brief flow of the pattern:

  1. When we receive a web-request from the client, the plumbing will start an NHibernateSession. This session will start to track all changes within this web-context.
  2. We pulls from repositories all domain entities that are required to perform the user request. (Line 3-4)
  3. We invoke domain methods on the entities. (Line 13-22)
  4. When we finish processing the request, and send the response back to the client, the plumbing will flush the session, thus saving all changes made in step #2 and #3 to the database.

This way, all the persistence tracking is done by the plumbing. Freed from traking and saving state changes, we are now able to implement our AddProduct logic as a domain method within SalesCart entity, which as you can see, contains no reference to any persistence concern (update/insert). The virtue of POCO/persitence-ignorance.
The application should not access any property under SalesCart directly. Everything has to go via domain methods of SalesCart, because it’s the Aggregate-Root, which we’ll discuss shorty.

Also notice another subtle thing. In the previous code, we reference entities by IDs (e.g. CustomerId, ProductId, SalesCartId), which demonstrates a very relational mindset. The reason it’s done that way is because referencing entities by objects would be an inefficient way from persistence viewpoint. You would have to load the whole entity even when ID would suffice. In the refactored code, objects associations are modeled in a natural way that reflects both domain-driven-design as well as just basic OOP we learned in school. ORM promotes this without compromising performance thanks to lazy-loading. I.e., the 2 following lines are almost exactly equivalent:

salesCart.CustomerId = customerId;
salesCart.Customer = session.Load<Customer>(customerId);

The second line does not make any database-call. It will only return a proxy with that customerId. The nice thing is, unlike customerId, the proxy object still acts as the actual Customer object: it will load from the database the first time we need it, e.g. when accessing salesCart.Customer.FirstName. This is yet another trick ORM pulls to pretend that “there is no database” without hurting performance.

Aggregate Root

SalesCart is an Aggregate-Root, another DDD concept. In essence, an Aggregate Root is an entity that consumers refer to directly, representing a consistency boundary. Aggregate-roots are the only kind of entity to which your application may hold a reference. Each aggregate-root composes and guards its sub-entities, and is persisted as a single unit. This helps avoid the mess (like in previous approach) because you now have a constraint that prevents you from creating a tight coupling to each individual sub-entities.
In our example, SalesCart is an aggregate-root. CartItem is not; it’s merely part of SalesCart aggregate. SalesCart is our single entry-point to the aggregate (e.g. to add a product). You can’t access CartItem directly outside the aggregate boundary, similary you don’t have repository or DAO for CartItem. It’s persisted as part of SalesCart (cascade update/insert/delete). Aggregate concept is a key rule that simplifies domain persistence greatly.

Infrastructure Is Skin Deep

After our previous posts, I hope by now we have agreed on one thing: infrastructure should sit at the outer skin of the application. Our infrastructure concern, in this case, is where and how we persist our domain objects. Before ORM, during the time when building an application was an act of writing some codes to execute a series of SQL statements to JDBC/ADO.NET, it was not possible to pull out database concerns away from our application code without making an unacceptable amount of degradation in performance.

ORM lets you to do exactly that. It hides away the database plumbing so it is not visible from the surface. It replaces the notion of database with something that looks and smells like a normal object Collection. In DDD, this collection of domain objects is known as “Repository”.

Repository

It’s a common mistake to take the term repository as another name for DAO. They might look similar but they are different in principle. If anything, Repository is another name for ICollection, and rightly so. Repeat 3 times, Repository == ICollection: a component that holds references to your objects, allowing you to Get/Find the objects back, and it keeps track of the changes and lifecycles of the objects. Just like an ICollection, it may have various implementations, like ArrayList, Dictionary, perhaps HttpSession, serialized files, or in our case, a relational database. These implementations are insignificant: they sit right at the outer skin of the application, and they are therefore swappable.

Just to remind you with the diagram from the previous post a year back:

IRepository<T> interface sits comfortably in Sheep.Domain.Services at the POCO core of the system. Using ORM, our repository is able to pretend to be a POCO collection. In Sheep.Infrastructure.Data, we have an implementation of the repository (NHibernateRepository<T>) that uses NHibernate to manage the persistence to a relational database. At runtime, this implementation will be injected to the core by IoC container. Note that Sheep.Infrastructure.Data is the only namespace with any reference to System.Data and NHibernate. Outside this namespace, IRepository pretends to be a POCO object collection.

Testability

ORM frameworks abstract your database plumbing into a unified abstraction such like standard object collection. Having this abstraction means that your code is not dependent to specific persistence mechanism. Linq is another language-level abstraction available to .net developers, which nHibernate also leverages. This combination, not only provides ability to substitute your persistence easily between in-memory, web-services, distributed-hash-table, and relational-database, but it also means you can replace the actual mechanism with much simpler fake implementation for testing purpose.

We’ll use the code from our previous code as an example:

public class AuthenticationService: IAuthenticationService
{
    IRepository<User> userRepository;
    ICommunicationService communicationService;

    public UserAuthenticationService(IRepository<User> userRepository, ICommunicationService communicationService)
    {
        this.userRepository = userRepository;
        this.communicationService = communicationService;
    }

    public void SendForgottenPassword(string username)
    {
        User user = userRepository.Where(u=> u.Username == username).First();
        user.ResetPinWith(PinGenerator.GenerateRandomPin());
        if(user != null)
            communicationService.SendEmail(user.EmailAddress, String.Format("Your new PIN is {0}", user.Pin));
    }
}

Here is the unit-test code:

// Arrange
ICommunicationService mockCommunicationService = MockRepository.CreateMock<ICommunicationService>();
IRepository<User> userRepository = new ArrayListRepository<User>();
var user = new User{UserName = “Sheep”, EmailAddress=”test@test.com”, Pin = “123123”};
userRepository.Add(user);

// Action
var authenticationService = new AuthenticationService(userRepository, mockCommunicationService);
authenticationService.SendForgottenPassword(“Sheep”);

// Assert
Assert.That(user.Pin, Is.Not.EqualTo(“123123”), “Password should be reset”);
mockCommunicationService.AssertCalled(x=> x.SendEmail(”test@test.com”, String.Format(“Your new PIN is {0}”, user.Pin));

As you can see, AuthenticationService in our unit-test above uses a simple implementation of IRepository (ArrayListRepository) via dependency-injection. Like ArrayList, this ArrayListRepository simply holds its objects in a variable-size memory list, and is not backed by any persistent database. During runtime, however, AuthenticationService will be using a repository that is backed by database-engine via ORM (e.g. NHibernateRepository). This is normally done by configuring your IoC container, but if it’s to be written by plain code, it would look like:

var authenticationService = new AuthenticationService(new NHibernateRepository<User>(nhSession), emailService);
authenticationService.SendForgottenPassword(“Sheep”);

This way, NHibernateRepository can sit right at the edge of the application. Our application code (AuthenticationService, in this case) does not have to be aware of the relational database, or the ORM. In term of .net speak, your domain projects should not have any reference to System.Data or NHibernate assemblies. Only your outmost layer (Infrastructure and IoC projects) should know about these assemblies.

Model First Development

That all POCO talk in ORM endorses a development flow that starts with you modeling your domain entities as POCO objects, focusing on shaping their behaviours and relationships, without touching any persistence concerns. This development style facilitates Agile, since we no longer need to use any database during early stage of development, which is by plugging Hibernate to an in-memory database (e.g. Sqlite, HSql, SqlCE), so we can focus on evolving our object models and behaviors without getting the friction from database schema in the way. We just go ahead and think in objects and behaviors; we don’t need to think about tables, foreign-keys, joins, CRUD, normalization, etc. Fluent NHibernate greatly smoothens out this methodology

Only at the later stage when we’re already happy with our business code that we would start looking at database infrastructure details, which is actually as simple as plugging our ORM to a real database (Oracle, SqlServer, MySql, etc). NHibernate will do the rest, including generating all the SQL to create all the table schemas for us.
In Agile, it’s highly imperative to delay infrastructure concerns for as long as possible to stay adaptive to changes.

Convention over Configuration

Common resistance of adopting ORM in projects is the sheer complexity of its configuration, particularly in Java side (Hibernate). It’s probably fine in Java: application development in Java is full of XML from nose to toes, verbose, and no one seems to mind about that. But that does not fit very well to .Net development culture where codes are concise, boilerplate codes are plague, XMLs are grotesque beasts, and developers aren’t particularly keen of spending whole afternoon on XML configs before even seeing any running application.

That drove the need for a tool like Fluent NHibernate that allows you to “configure” your ORM using convention. I’m not going to bore you with the detail, but in general, it almost frees you completely from having to configure anything for your ORM to just work and save your objects to your database like magic. It lends itself to model-first-development with zero-friction. You simply go ahead and write your domain objects, then NHibernate will figure out the rest. You can immediately save your objects to the database (and query it), without even touching any mapping configuration from your side. It all sounds too magical. Fluent NHibernate allows this by inspecting your entities and properties, and automatically uses conventions to automatically generate table schemas, column definitions, associations, and constraints for you (e.g. if you choose pluralization, a Person class will be automatically mapped into a generated table named CRM_PEOPLE). You can always then stray away from this convention in a case-per-case basis when necessary by overriding the configuration using its fluent API for your specific entities.

The same capability is also available in Entity Framework since .Net 4.0.

Finally

I have to admit I overhyped many things quite a bit there. Most ORMs suffer from what we call leaky abstraction. There is an impedance mismatch between relational database and object-oriented language. Abstracting relational database into ICollection look-alike is incredibly hard, which is why we need ORM in the first place. It needs a good understanding in how ORM works under the hood in order to use it effectively, and it is very hard. There are luckily some tools that can greatly assist you in doing so like NHibernate Profiler. It analyses the way you use NHibernate, and give you useful advices and warnings when it detects performance problems that might need to be looked at. The existence of such a dedicated commercial tool in a way only highlights how complex ORM framework like NHibernate is.

These complexities with ORM frameworks fueled CQRS architecture and NoSql movement. Now that I have finally got this ORM chat out of the way, hopefully I will get to write some post about these new topics, or rather, about what I have learnt from NoSql and CQRS buzz so far. And now about this series. Next posts will probably cover TDD and DDD, hopefully not after another year of disappearance.

Advertisements

Data Access Test with Sqlite

I have mentioned briefly in previous post about 5 different approaches to write unit-test for data-access code. So now I will try to cover the first method in more detail. And among the 5, this method seems to be the most common one so far.
I will be using Sqlite as in-memory database. Some other arguably better alternative is available, that is if you are happy to fork out some extra bucks on VistaDB.
So let’s reemphasize the desired characteristics of a good unit test, especially in the context of data-access code.
1. Isolated. Database (by definition) persists any change to its state. However, this is usually not desirable in unit-test. A good test case runs in complete isolation from any changes made in other test-cases. I.e., any changes to the database from one test case should not be visible from other test-cases.
2. Repeatable. Regardless how many times a unit test is executed, a consistent result is expected. For this to happen, unit-test should not rely on presumption on external condition, especially shared database.
3. Fast. If your test fixture cannot be executed in every several minutes or so, there is only one thing that can possibly happen: developers will start abandoning it. And unfortunately, this is usually the case if you are connecting to database system from test code, where the whole test-suite can take up to an hour to execute, if you are lucky. No one can run it frequent enough to make it useful, and hence no one will keep on maintaining it.

IN MEMORY DATABASE
The main reason why using in-memory database for unit-test is that it is incredibly fast! Both in restoring zero state, execution, and cleaning up. Speaking of which, let’s take a look on typical cycle of in memory database in unit-test.
We are going to have NHibernate to build the database schema from scratch before each test case, and dispose it at the end of the test case. Then start again with building the schema on clean database again for the next test-case. This way, we always have an empty sheet to work on for each test-case without affecting (or being affected by) any database changes in other test-cases.
Here is some NUnit test case example using NHibernate.

[TestFixture]
public class CustomerRepositoryTest: InMemoryDBTestFixture
{
	private ISession session;
	private CustomerRepository repository;
	
	[TestFixtureSetuUp]
	public void FixtureSetUp()
	{
		InitialiseNHibernate(typeof(Customer).Assembly);
	}
	
	[SetUp]
	public void SetUp()
	{
		this.session = this.CreateSession();
		this.repository = new CustomerRepository(
			new FakeSessionManager(session));
	}
	
	[TearDown]
	public void TearDown()
	{
		this.session.Dispose();
	}
	
	[Test]
	public void CanQueryCustomerByLastname()
	{
		var customers = new List<Customer>(){
			new Customer(){
				FirstName="Peter",
				LastName="Griffin"},
			new Customer(){
				FirstName="Other",
				LastName="Lads"},
			new Customer(){
				FirstName="Meg",
				LastName="Griffin"},
			new Customer(){
				FirstName="Yet Another",
				LastName="Bloke"}};
		
		foreach(var cus in customers)
			session.Save(cus);
		session.Flush();
		foreach(var cus in customers)
			session.Evict(cus);
		
		var loaded = repository.QueryByLastname("Griffin");
		
		Assert.That(loaded.Count, Is.EqualTo(2));
		AssertLoadedDataEqual(loaded[0], customers[0]);
		AssertLoadedDataEqual(loaded[1], customers[2]);
	}
	private static void AssertLoadedDataEqual(Customer loaded, Customer saved)
	{
		AssertThat(loaded, Is.NotEqualTo(saved)); // Make sure it's not cached data
		
		Assert.That(loaded.ID, Is.EqualTo(saved.ID));
		Assert.That(loaded.FirstName, Is.EqualTo(saved.FirstName));
		Assert.That(loaded.LastName, Is.EqualTo(saved.LastName));
	}
}

The plumbing for initializing NHibernate and building in-memory database is managed by base-class InMemoryDBTestFixtureBase. It is a very common practice to have this kind of base class for all database test-fixtures within a project, so we we can turn our back on setting up test database and concentrate on testing what we care about. Let’s take a look at the base class code.

public abstract class InMemoryDBTestFixtureBase
{
	protected static ISessionFactory sessionFactory;
	protected static Configuration configuration;

	public static void InitialiseNHibernate(params Assembly [] assemblies)
	{
		if(sessionFactory!=null)
			return;

		var prop = new Hashtable();
		prop.Add("hibernate.connection.driver_class", "NHibernate.Driver.SQLite20Driver");
		prop.Add("hibernate.dialect", "NHibernate.Dialect.SQLiteDialect");
		prop.Add("hibernate.connection.provider", "NHibernate.Connection.DriverConnectionProvider");
		prop.Add("hibernate.connection.connection_string", "Data Source=:memory:;Version=3;New=True;");

		configuration = new Configuration();
		configuration.Properties = prop;

		foreach (Assembly assembly in assemblies)
			configuration = configuration.AddAssembly(assembly);
		sessionFactory = configuration.BuildSessionFactory();
	}

	public ISession CreateSession()
	{
		var session = sessionFactory.OpenSession();
		new SchemaExport(configuration)
			.Execute(false, true, false, true, session.Connection, null);
		return session;
	}
} 

This base class takes care of configuring NHibernate with Sqlite in-memory provider, and registering all the assemblies where our hbm files are located. CreateSession method will load the new session with a fresh in-memory database, and it gets NHibernate to build it with the schema from those hbm files.

BOTTOM LINE
We have achieved our 3 objectives for isolation, repeatability, and speed. Additionally, compared to the other 4 unit-test approaches, in-memory database offers a unique advantage.
Each test-case is self sufficient: specifying its own pre-condition (initial data), and verifying the final outcome. Each test case is self explanatory in revealing the intention of the test. Test readers will find it remarkably easy to follow each of the test-cases independently without having to switch back and forth between Visual Studio and database IDE or dataset XML (as in the case with nDbUnit).
Having self-sufficient test-case might as well come as a disadvantage considering how bloated the test-code ends up, even for this rather simplistic example. In practice, this can get worse since you have to deal with populating the data for all chain of unrelated tables as well only to satisfy foreign key constraints when setting up initial data. Not to mention the frustration when the database schema changes every now and then. It is very likely that data initialization would typically take up majority of the test code, just to support mere couple lines of real test logic that we really care about.
Another disadvantage is that not all functionalities will work (or behave the same way) between in-memory and targetted database. Not to mention various subtle idiosynchracies with Sqlite.
And if you don’t use data-access framework that offers cross database portability (like NHibernate does), this approach is not even an option.

How Do You Test Pipe and Filter?

In his notable MVC Storefront series, Rob Conery brought up a very intereting pattern, called Pipe and Filter. My first reaction to that was a bit anxious if it might hurt testability. Before I start with the problem, I think this pattern deserves a little bit introductory words, just in case you have stayed under the rock for the last 12 months, and haven’t checked Rob Conery’s posts.

Pipe and Filter is a pattern recently made popular by Rob’s MVC Storefront episodes. Also known as Filtration pattern, it is a very nice trick to produce a highly fluent Linq2Sql data repository (although, well, it’s not quite repository anymore).

Basically, instead of having several specific-purpose querying filters on repository interface like this:

var list = customerRepository
	.FindByStateAndAgeBelow("Tazmania", 20);

We can now use a far more fluent and flexible syntax through chained filtering statements like this:

var list = customerRepository.All()
	.WithState("Tazmania")
	.WithAgeBelow(20).List();

The magic behind it is .Net 3.5’s Extension Methods.

public static class CustomerFilter
{
	public IQueryable<Customer> WithState (this IQueryable<Customer> query, string state)
	{
		return from cus in query 
			where cus.HomeAddress.State == state
			select cus;
	}
	public IQueryable<Customer> WithAgeBelow(this IQueryable<Customer> query, int age)
	{
		var maxDob = Date.Now.AddYear(-age);
		return from cus in query 
			where cus.BirthDate < maxDob 
			select cus;
	}
}

Testing both of these filters is easy. Just create a collection of customer object, execute the filter, then go ahead and check the result. Here is the unit-test code for WithState filter.

// Stub Customer List
var list = new List<Customer>()
{
	new Customer() {HomeAddress = new Address() {State = "Illinois"}},
	new Customer() {HomeAddress = new Address() {State = "Tazmania"}},
	new Customer() {HomeAddress = new Address() {State = "NSW"}},
	new Customer() {HomeAddress = new Address() {State = "Tazmania"}}
};
var query = list.ToQueryable();

// Execute
var filtered = query.WithState("Tazmania").ToList();

// Verify
Assert.That(filtered.Count, Is.EqualTo(2));
Assert.IsTrue(filtered.Contains(list[1]));
Assert.That(filtered.Contains(list[3]));

Now imagine I work in an ambitious evil project, where I have a small piece of method in my business logic that sends spam emails to all Tazmanian teens.

public void SendAdvertisement(string message)
{
	foreach(var customer in 
		customerRepository.All()
		.WithState("Tazmania").WithAgeBelow(20))
	{
		emailSender.Send(customer.EmailAddress, message);
	}
}

The question is now, how do we write unit test to verify this logic?

Had we used customerRepository.FindByStateAndAgeBelow(“Tazmania”, 20), we would be able to just mock away the call to customerRepository.FindByStateAndAgeBelow(), (hence, behavior or interaction-based verification), and we are sorted.

But here, the problem with Pipe and Filter pattern is the fact that it uses Extension Methods, which are essentially static methods! (And we all know, static methods are the villains in TDD world). They are not mockable, and we have to deal with the real filter implementations.

True, I could just use the same state-based verification approach that we have done above on WithState unit-test by stubbing up a list of customer objects, and then verify the emailSender based on the expected filtered customers. But this is really gross.
I don’t want to care about the internal behavior of the filters here. As the matter of fact, each filter has already had its own unit-test (we have written one above), and I don’t want to repeat myself here. All that I really want to care is if my business logic makes the correct calls to the correct filters, and put the filtration result as our spam targets.

I want to gather how you write unit-test for Pipe|Filter pattern. How you mock out filter logic from your business-logic tests. I have a quick thought in mind about writing a Rhino-Mock helper to catter this scenario. I have yet to try it out, and I will write it on the next post as soon as I do. But first, I would like to hear what other people think about this. Any comment?

Unit Testing Data Access Code

There are a lot of challanges to write unit-tests for database facing code. It typically requires a lot of setup code, a lot of effort to maintain, and hard to make it repeatable. It has seemed to be a universal excuse for a lot of teams to drop TDD practice.

So I gather here all various approaches that I am aware of when dealing with database unit-testing. In the next several posts, I will cover each of the implementations in detail, but let’s just have a quick overview on all the options. More than anything else, it’s just a matter of taste really, rather than situational. Hopefully after the next several posts you can make out the good and the bad of each approach.

  1. In-memory database using Sqlite
  2. Transaction rollback
  3. Restorable file-based DB
  4. Preset state with NDBUnit
  5. Fake DB provider with Linq

Throughout the major part of these posts, I will assume the use of nHibernate. Why? Simply because we need somewhere to work on. These approaches are however equally applicable to virtually any ORM or data-access framework.

PS: It is perfectly valid to argue that these are all integration tests, rather than unit-tests. But I digress.

Fluent NHibernate

Jeremy Miller has done a great work on Fluent NHibernate (and here) that offers strong-typed nHibernate mapping configuration, requiring absolute ZERO xml! You will get all the type-safetiness, significantly reduced noise, convention-over-configuration simplicity, seamless bootstrap to IoC, and it even offers a set of library that allows you to write unit-tests for your mapping config. Yes, imagine it… TDD-ing your nHibernate configuration with test-first approach!

Combined with Ayende’s work on Linq2nHibernate, apparently we have now been to the point where it takes nothing but all strongly-typed C# code to get full capability of an ultra powerful ORM technology like nHibernate.

Live without XML and SQL/HQL string, oh what a wonderful world.

Code Generation in ORM

Recently, I had a heated discussion in a .Net list about ORM tools that are centered around code-generation, like Subsonic. (Okay, okay, SubSonic is more than ORM). While SubSonic has a very decent ORM solution, I can’t agree that code-generator produces decent ORM.

First of all, code-generator *conveniently* produces all entity’s properties as public getters and setters. This conveniency is exactly what I refer to as violation of encapsulation notion in OO principles. When an entity exposes all of its states, it becomes very easy for layers above it to change them. And this easiness is not a good thing. It leads into the an unmanagable code where making a small change would involve poring over thousands of lines of code. By letting anyone to freely mutate entity’s states, we allow business logic to be just dumped into a class and be called.

Second problem is that all auto-generated classes are practically owned by the auto-generator. They steal them from you. Remember, if we somehow attempt to change the autogenerated code, we will lose it the next time we run auto-generation. We no longer own the classes. If we can’t add any behavior to the entities, then they will be nothing more than a mere dumb data cargo. Anemic Model, the same problem that I had before. It’s just CLR version of databse table. Then someone will need to give me a good reason why I still should call it Object Relational Mapping?

The argument about productivity advantage offered by code-generation is not entirely valid either. Now there are 2 parts of a typical business application (among others): domain objects and database. Subsonic autogenerates domain objects from database structure. The others (like NHibernate) generates database structure from domain objects. None can generate both, which is good news for us, software developers, since we apparently are still worth some salary.

The question left is now which one you want to write yourself, and which one you want the machine to do for you.

If you’re writing an business application that involves complex business logic, then you will find domain objects as the heart of your application. And you will want to write them yourself. The tedious part of managing persistence can then be delegated to our fool auto-migration. This is the most common approach in NHibernate.

Now if you’re working in data-centric application, particularly thus involving complex data structure, then you will find your database as the heart of your application. You spend most of your time on tables and SQL scripts. In this case, domain object is just a boring part that merely sits there to bridge your data to presentation. Let code-generator does the brainless job for you. And SubSonic is really good on this. Castle AR and NHibernate also provide this approach.

Having the codebase owned by the tool is not necessarily a bad thing. If we don’t own the code, then we don’t have to spend the cost of maintaining it. The tool owns the code, the tool nurtures it for you.

Another arguably good situation where you should use code-generation for ORM is when you’re working on legacy database, typically when you are rewriting legacy code. This is probably true, but I would only use the auto-generated entities as datastructures, not as objects. Pretty much to treat them as dataset, where you need to add an adapter layer to map them into domain objects that will be the only things used by the rest of the application.

Either way, most people wouldn’t classify code-generations (like Linq2Sql and Subsonic) as ORM, because, quite obvious, there is no mapping going on there. It’s just a strongly typed interface to database tables. To bridge impedance gap between the 2 worlds, you will have to write your own mapping from auto-generated *table-object* into your own domain entity.

Should Domain Entity Be Managed By IoC?

One question that has always been puzzling people when starting doing Domain Driven Design is how a domain entity can talk to repositories or services controlled by IoC. Particularly if those entities are (and usually they are) loaded from ORM like nHibernate. There are a lot of controversy in DDD of whether a domain entity should depend on repositories or services at first place. First, a logic that requires dependency to repositories or services indicates that it is cross-aggregate. And the basic principle is, a logic that crosses aggregate boundary should NOT live in domain entity. It should live in services, and no domain entity should be aware of it. However, in reality, most except the simplest domain logics require external services. For example, a Sales entity is responsible of calculating the payable price. And in order to do so, it needs to figure out the discount, which then involves a business-rule-engine. Hence dependency from Sales domain to BRE service. So I think the question should not be about whether an entity should depend on services. It is the fact that we have to accept that more often than not, domain entity will absolutely need services. Without it, you will most likely end up with Anemic Domain Model, which at extreme level can be very very bad based on my experience. Now back to the question: how the domain should get hold of the services, particularly if the creation is not within our control (e.g., from ORM)? Apparently this has been a regular topic in Alt.Net and DomainDrivenDesign discussions that is always brought up every 1 week or two. Some suggestions given to me by quite few people, like Alasdair Gilmour, Larry Chu, et al, have given me a new insight that I have never expected before. IoC is not only about Services. It’s also about domain entity. Out of various approaches that Larry pointed out, there are generally 2 viable solutions to solve the dependency of domain-entity without tying it up with infrastructure detail. The first approach is to leverage interceptor. If you use nHibernate, Billy McCafferty shows how we can inject DI container from nHibernate loader interceptor. My response to it, I don’t like the idea of injecting DI container to the entity. But the similar approach can be used to inject the dependencies (instead of the container) into the entity. Unity container has an elegant solution for this kind of scenario using BuildUp method. You simply have to register your *domain entity as one of IoC services*. Hence the title of this post. There is currently no similar functionality in Spring.Net or Castle Windsor. But it’s pretty straightforward to do it ourself. What we need is a custom Dependency attribute, and decorate entity’s properties with it to mark as injectable dependency:

public class Sale
{
private ICustomerRepository customerRepository;

[Dependency]
public ICustomerRepository CustomerRepository
{
set {this.customerRepository = value;}
}
}

And write the NHibernate interceptor to introspect entity’s properties that are marked with the attribute, and resolve it from IoC container:

public class DepdencyInterceptor : IInterceptor
{
public bool OnLoad(object entity, object id,

object[] state, string[] propertyNames, IType[] types)
{

// Use “setter injection” to give objects’s dependency
// for each property marked with Dependency attribute

foreach (PropertyInfo prop in entity.GetType().GetProperties())
{
if (prop.CanWrite &&
prop.GetCustomAttributes(typeof (DependencyAttribute), true) != null)
{
object dependency = IoC.Resolve(prop.DeclaringType);

prop.SetValue(entity, dependency, null);
}
}

// Even though we gave the object it’s dependency, return
// the fact that we didn’t modify the persistent properties
return false;
}

// … All other methods of IInterceptor
}

Last, register the interceptor into NHibernate session manager:

NHibernateSessionManager.Instance.RegisterInterceptor(new NHibernateDepdendencyInterceptor());

The main drawback of using nHibernate interceptor, is that it’s limited to inject dependency to entity that is publicly accessible by its aggregate root. But I personally am not too concerned about that. I will think twice if I need to push dependency to inaccessible entity of an aggregate at first place. Because, IMO, the dependency of entity in the aggregate is the dependency of the aggregate root as a whole, at least from TDD point of view. It will be difficult to test-drive the aggregate root otherwise, and if that is the case, IoC-ORM integration will be the last thing I am worried about. The second approach, particularly if you don’t use nHibernate, is by using bytecode instrumentation. For instance, using post-compilation AOP, like PostSharp, to mangle the entity’s constructor to resolve its depdendencies (decorated fields/properties) from the IoC. This is by far the most powerful solution that can solve all possible cases. It is also ORM agnostic. This approach applies with most kinds of ORM, Sql-mapper, or even without ORM. The drawback of this approach, or rather the drawback of code-mangling AOP in general, is that it involves too much dark magic that makes things less clear to follow, and can be very difficult to debug. For this reason, dynamic proxy AOP is generally favorable over code-mangling. Dynamic proxy, unfortunately for our purpose, is not an appropriate option. Many IoC containers, mostly in Java, support the second approach out of the box. But I am not aware of any similar functionality in Windsor, or most .Net IoC framework in general. So we will need to get our hand a little bit dirty with PostSharp.