#### 23 Apr 2015

# What test factories are hiding from you

Do you use test factories? You may be missing valuable feedback from your tests that could lead to better design.

If you're like me, you've started using factories right about the same time you started testing. They are a great way to create a lot of data to use in your tests. Without them, you would have to do all of that hard work yourself. Recently however, I've been rethinking my default "use factories" approach to testing.

## Factories make bad design easy

Test Driven Development (TDD) offers a number of benefits. The self-testing code
that results from diligent application of TDD allows you to refactor your code
with confidence. The primary benefit of TDD however, is supposed to be an
improvement in design ^{[1]}. Yet I see people
missing out on these design benefits even when they write the tests first and
spend the time to refactor. I think that one of the problems is the ease of
using factories to generate large quantities of data. From here on out I will be
showing what I mean by going through code covered by tests written with
factories, and discussing what other methods of testing might help us to uncover
design flaws.

To begin, I will explain the feature that we are trying to build. The project's requirements are to build a feature that returns me the minimum and maximum projected revenue for all of my customers for the next year. Each customer has a rate assigned to them that indicates a minimum and maximum percentage of this year's revenue we expect to get next year.

For example, if I have a customer who provided me with $1,000,000 in revenue last year, and it has been determined that their minimum projected revenue for this year is 80% of that, and the maximum projected revenue for this year is 110%, the minimum projected revenue would work out to be $800,000, and the maximum projected revenue would work out to be $1,100,000.

Finally, the actual feature is to calculate the sum of these amounts for multiple customers. So if you have one customer projected to have a minimum revenue of $800,000 and a maximum of $1,100,000, and another who is projected to have a minimum of $600,000 and a maximum of $900,000, the total minimum revenue will be $1,400,000 and the total maximum revenue will be $2,000,000. If this example seems contrived, it's because I've changed the names of things to protect the innocent (Boltmade's clients), and this is the best I could come up with.

All of the code from here on out can be found at the GitHub repository ericroberts/factories. The individual refactorings can be found in pull requests. We'll go through them all in this post.

Here's the test for the feature I've just described:

```
1 RSpec.describe Estimator do
2 subject { customer.estimator }
3 let(:customer) { create :customer }
4
5 describe "#projection" do
6 it "should return the sum of the estimated min and max projections" do
7 expect(subject.projection).to eq [
8 customer.revenue * customer.rate.min,
9 customer.revenue * customer.rate.max
10 ]
11 end
12 end
13 end
```

Here are the test factories:

```
1 FactoryGirl.define do
2 factory :estimator do
3 end
4
5 factory :customer do
6 association :rate
7 association :estimator
8 revenue 100
9 end
10
11 factory :rate do
12 min 80
13 max 90
14 end
15 end
```

And finally, here's the code that actually implements the feature:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.inject([0,0]) do |(min, max), customer|
6 min += customer.revenue * customer.rate.min
7 max += customer.revenue * customer.rate.max
8 [min, max]
9 end
10 end
11 end
12
13 class Customer < ActiveRecord::Base
14 belongs_to :rate
15 belongs_to :estimator
16
17 # revenue method added by ActiveRecord
18 end
19
20 class Rate < ActiveRecord::Base
21 def min
22 self[:min] / 100.to_f
23 end
24
25 def max
26 self[:max] / 100.to_f
27 end
28 end
```

Doesn't look too bad, does it? Let's see what we can discover by refusing to use factories.

## Without factories

We're going to write this without using factories at all. This means we will have to provide all the data ourselves. I'm going to choose to do this with stubs. Here's what the same test looks like when we stub all of our data:

```
1 RSpec.describe Estimator do
2 subject { Estimator.new }
3
4 describe "#projection" do
5 let(:customer) { double }
6 let(:rate) { double }
7
8 before do
9 allow(subject).to receive(:customers).and_return([customer])
10 allow(customer).to receive(:revenue).and_return(100)
11 allow(customer).to receive(:rate).and_return(rate)
12 allow(rate).to receive(:min).and_return(80)
13 allow(rate).to receive(:max).and_return(90)
14 end
15
16 it "should return the sum of the estimated min and max projections" do
17 expect(subject.projection).to eq [
18 customer.revenue * customer.rate.min,
19 customer.revenue * customer.rate.max
20 ]
21 end
22 end
23 end
```

Stubbing all of those methods is painful. And worse, if any of the methods being called change, we'll have to change the stubs. That's not very good. And yet, I will argue that this pain is a good thing. This pain is alerting us to the presence of bad design.

If we look at those tests, what is that we are stubbing? First we have to stub the customers method to return our own Customer double. Then we have to stub not one but two methods on customer to return specific values.

So what is this test telling us? I would argue that this test is telling us that Estimator knows too much about Customer. This is where we should step back and figure out exactly what it is we want from Customer. Why does Estimator need to know the customer's revenue and the customer's rate? What is it we are doing here?

Our original feature description wanted us to sum the customer's projected revenue. We are doing this, but we implemented it so that all the work is in Estimator. Estimator really just cares about what the customer's projected revenue is, and yet we have implemented it so that Estimator knows how to calculate a Customer's projected revenue. That feels like the responsibility of a Customer, so let's make it one.

Here's the test for the new method on Customer:

```
1 RSpec.describe Customer do
2 subject { Customer.new(revenue: 100) }
3 let(:rate) { double }
4 let(:min) { 80 }
5 let(:max) { 90 }
6
7 before do
8 allow(subject).to receive(:rate).and_return(rate)
9 allow(rate).to receive(:min).and_return(min)
10 allow(rate).to receive(:max).and_return(max)
11 end
12
13 it "should return the min and max projection" do
14 expect(subject.projection).to eq [
15 subject.revenue * rate.min,
16 subject.revenue * rate.max
17 ]
18 end
19 end
```

Then we can implement this method on Customer like so:

```
1 class Customer < ActiveRecord::Base
2 belongs_to :rate
3 belongs_to :estimator
4
5 def projection
6 [
7 revenue * rate.min,
8 revenue * rate.max
9 ]
10 end
11 end
```

Now we can revisit our Estimator tests and update them to look like this:

```
1 RSpec.describe Estimator do
2 subject { Estimator.new }
3
4 describe "#projection" do
5 let(:customer) { double }
6 let(:projection) { [80,90] }
7
8 before do
9 allow(subject).to receive(:customers).and_return([customer])
10 allow(customer).to receive(:projection).and_return(projection)
11 end
12
13 it "should return the sum of the estimated min and max projections" do
14 expect(subject.projection).to eq [
15 projection.min,
16 projection.max
17 ]
18 end
19 end
20 end
```

And the updated Estimator code:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.inject([0,0]) do |(min, max), customer|
6 min += customer.projection.min
7 max += customer.projection.min
8 [min, max]
9 end
10 end
11 end
```

I think we've definitely made an improvement here. Estimator know only knows one method on Customer. But is there more we could do? Let's leave Estimator for a second and go back to our current tests for Customer.

```
1 RSpec.describe Customer do
2 subject { Customer.new }
3
4 describe "#projection" do
5 let(:rate) { double }
6
7 before do
8 allow(subject).to receive(:rate).and_return(rate)
9 allow(rate).to receive(:min).and_return(min)
10 allow(rate).to receive(:max).and_return(max)
11 end
12
13 it "should return the min and max projection" do
14 expect(subject.projection).to eq [
15 customer.revenue * rate.min,
16 customer.revenue * rate.max
17 ]
18 end
19 end
20 end
```

To me, this looks like Customer knows too much about Rate. Why does Customer care that Rate has a min and a max? What is it that we actually want to get from customer? When you really think about it, we want to multiply a customer's revenue by a customer's rate. The fact that Rate happens to be represented with a minimum and maximum value shouldn't matter to us. This is the code we want:

```
1 def projection
2 rate * revenue
3 end
```

So just how do we go about doing this? It's important to remember here that `*`

is just another method. `1 * 2`

is just syntax that calls the method `*`

on `1`

and passes `2`

as the argument. There is special syntax that allows us to
leave off the `.`

which makes it look different. If we imagined that `*`

was
really called `times`

, it would look like this: `1.times(2)`

.

With that knowledge, we can go ahead and implement the `*`

method on Rate.
Here's our test:

```
1 RSpec.describe Rate do
2 subject { Rate.new(min: min, max: max) }
3 let(:min) { 80 }
4 let(:max) { 90 }
5
6 describe "#*" do
7 let(:multiplier) { 100 }
8
9 it "should return the two possible rates" do
10 expect(subject * multiplier).to eq [
11 min * multiplier,
12 max * multiplier
13 ]
14 end
15 end
16 end
```

And the implementation:

```
1 class Rate < ActiveRecord::Base
2 def min
3 self[:min] / 100.to_f
4 end
5
6 def max
7 self[:max] / 100.to_f
8 end
9
10 def * other
11 [
12 min * other,
13 max * other
14 ]
15 end
16 end
```

Now we can multiply a Rate by some other number and get an Array back that contains the minimum and maximum values as applied to that other number.

This now simplifies our code in Customer:

```
1 class Customer < ActiveRecord::Base
2 belongs_to :rate
3 belongs_to :estimator
4
5 def projection
6 rate * revenue
7 end
8 end
```

But what should we test now? One of the points of this was for Customer to know
less about Rate. Right now however, our tests still know about Rate so we can
check the return values. But maybe we shouldn't be testing the return values at
all? Rate already has a test that asserts what happens when you call the `*`

method on it. Therefore, all we want to test here is that we called that method
properly, and with the correct arguments. Now our Customer test looks like this:

```
1 RSpec.describe Customer do
2 subject { Customer.new(revenue: revenue) }
3 let(:revenue) { 100 }
4
5 describe "#projection" do
6 let(:rate) { double }
7
8 before do
9 allow(subject).to receive(:rate).and_return(rate)
10 end
11
12 it "should send the * message to rate" do
13 expect(rate).to receive(:*).with(revenue)
14 subject.projection
15 end
16 end
17 end
```

Now Customer looks simpler, and all it knows is that Rate has a `*`

method. But
what happens when we go back out to Estimator? What does it know still?

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.inject([0,0]) do |(min, max), customer|
6 min += customer.projection.min
7 max += customer.projection.max
8 [min, max]
9 end
10 end
11 end
```

Estimator still knows what it is expection from the customer's projection method. Conceptually all we want to do is get the sum of the customer's projections. It would be nice if we could do something more like this:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.inject([0,0]) do |sum, customer|
6 sum + customer.projection
7 end
8 end
9 end
```

So, let's make our code work like that! First, we're going to need an object that takes a min and a max itself and knows how to add other mins and maxes to it. For lack of a better name, we'll call it MinMax. Here's the test:

```
1 RSpec.describe MinMax do
2 subject { MinMax.new(min, max) }
3 let(:min) { 80 }
4 let(:max) { 90 }
5
6 describe "#+" do
7 let(:other) { [min, max] }
8
9 it "should return a new object that responds to min and max" do
10 new_min_max = subject + other
11 expect(new_min_max.min).to eq subject.min + other.min
12 expect(new_min_max.max).to eq subject.max + other.max
13 end
14 end
15 end
```

And here's the implementation:

```
1 class MinMax
2 attr_reader :min, :max
3
4 def initialize(min, max)
5 @min, @max = min, max
6 end
7
8 def + other
9 new_min = min + other.min
10 new_max = max + other.max
11 self.class.new(new_min, new_max)
12 end
13
14 def self.zero
15 new(0, 0)
16 end
17 end
```

We can use this new class like this:

```
1 MinMax.new(1, 10)
2 #=> #<MinMax:0x007ff19604e2e8 @min=1, @max=10>
3
4 MinMax.new(2, 10)
5 #=> #<MinMax:0x007ff193045650 @min=2, @max=10>
6
7 MinMax.new(1, 10) + MinMax.new(2, 10)
8 #=> #<MinMax:0x007ff1950009b8 @min=3, @max=20>
9
10 MinMax.new(1, 10) + [2, 10]
11 #=> #<MinMax:0x007ff1950009b8 @min=3, @max=20>
```

Now we can refactor Estimator like so:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.inject(MinMax.zero) do |minmax, customer|
6 minmax + customer.projection
7 end
8 end
9 end
```

Which can now be written even more succinctly as:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.map(&:projection).inject(MinMax.zero, :+)
6 end
7 end
```

And we're done! Now our final implementation code looks like this:

```
1 class Estimator < ActiveRecord::Base
2 has_many :customers
3
4 def projection
5 customers.map(&:projection).inject(MinMax.zero, :+)
6 end
7 end
8
9 class Customer < ActiveRecord::Base
10 belongs_to :rate
11 belongs_to :estimator
12
13 def projection
14 rate * revenue
15 end
16 end
17
18 class Rate < ActiveRecord::Base
19 def min
20 self[:min] / 100.to_f
21 end
22
23 def max
24 self[:max] / 100.to_f
25 end
26
27 def * other
28 [
29 min * other,
30 max * other
31 ]
32 end
33 end
34
35 class MinMax
36 attr_reader :min, :max
37
38 def initialize(min, max)
39 @min, @max = min, max
40 end
41
42 def + other
43 new_min = min + other.min
44 new_max = max + other.max
45 self.class.new(new_min, new_max)
46 end
47
48 def self.zero
49 new(0, 0)
50 end
51 end
```

We have a bit more code than before, but the complexity of individual pieces has gone down. One objection I sometimes run into when proposing refactorings like this is that the complexity of the system as a whole has gone up. This is certainly true. In fact, we can use tools to assess what has changed. I like to use the Ruby tool Flog.

Here's the scores before the refactoring:

```
1 35.4: flog total
2 5.9: flog/method average
3
4 20.7: Estimator#projection lib/estimator.rb:6
5 4.1: Rate#min lib/rate.rb:4
```

And after:

```
1 46.1: flog total
2 3.8: flog/method average
3
4 9.2: MinMax#+ lib/min_max.rb:8
5 7.5: Estimator#projection lib/estimator.rb:7
6 4.8: Rate#* lib/rate.rb:12
7 4.4: main#none
8 4.1: Rate#min lib/rate.rb:4
```

As you can see, the overall complexity of the system went up, as expected. But now the most complex method is a little less than half of what it was before. I prefer for the complexity of my pieces to go down even if the total complexity goes up, because there is a point in every software project where you can no longer understand the whole thing anyway. When that happens, I would like to have small pieces I can easily understand and work with.

## So what's wrong with this kind of testing?

If you've been paying attention, or you're familiar with this kind of testing, you may have had some hesitancy about using stubs and mocks to test. They have problems themselves. Whenever you stub a method and make it return a value, you are stating that you expect that method to exist and to return a value like the one you are returning. If that code ever changes, you won't know, your test with the stubbed value will continue to work.

Thankfully, there are ways to help mitigate this problem. RSpec 3 adds
`instance_double`

and `class_double`

which won't allow you to stub methods
that don't exist. This gets us part of the way there, but it won't ensure that
those methods return anything like what we say they return in our stubs.

So the problem remains. It is possible to write tests this way, change the behaviour of collaborators, and have your tests still pass. This is not good. The test we started with doesn't have this problem. It will throw errors when methods on collaborating objects change because we are dealing with real collaborators.

There's a place for both kinds of tests. It's not that you should always test like this, or always test like that. It seems to me that the best tests for ensuring your system works (like our first test), are not the same as the best tests for getting the most design benefit out of TDD. Personally, I use factories less often than I did in the past, and use this technique more often. I supplement this with more integration tests so I can maintain confidence that everything is still working. This is what's working for me right now.

If you have any questions or comments about this post or the code discussed, I would love to hear them! You can comment here, on the repository, or tweet at me at @eroberts.

Good luck with your testing!