Speeding up entity framework migrations in integration tests
Important addendum
After writing this post, I realised the proposed final solution doesn’t cover every use case. The migration script can only generate tables, not views for example.
I’m a full convert to the cult of integration tests. They provide:
- Increased confidence dealing with external services such as http apis and databases
- Reduced maintenance of unit tests — changing the implementation is easier as there’s no mocking of internal dependencies
- Increased confidence to accept automated PRs for dependency upgrades from tools like renovate
Unfortunately they do take longer to run than simple unit tests. To prevent colleagues becoming frustrated and disillusioned with the test suite I set out to increase the speed of ours.
The fixtures use Testcontainers to spin up docker containers on the CI agent. I used another library to ensure xunit only creates one of each container, which has the benefit of being faster, but also the downside that it introduces shared state to the tests. With tests running in parallel, again for optimal speed, it’s crucial to ensure the tests don’t interfere with each other.
To isolate the tests from each other I create a database per test with a guid for the database name. Whilst this allows the tests to be completely independent, over the course of development they became slower and slower. The reason for this I discovered is the increasing number of database migrations.
To perform the migrations we would resolve the database context and call MigrateAsync:
await using AsyncServiceScope scope = _factory.Services.CreateAsyncScope();
SampleDbContext dbContext = scope.ServiceProvider.GetRequiredService<SampleDbContext>();
await dbContext.Database.MigrateAsync();
This is very simple, but hides what it’s doing in the background:
- Create the database if it doesn’t exist
- Check what migrations haven’t already been applied
- Loop over the migrations and apply them individually
Applying migrations individually is the performance killer and results in N tests * M migrations calls to the database.
As time went on and the number of tests and migrations increased, intermittent failures started to show up during migrations. A build may take 5 minutes and fail all because of a single flaky test, which is very frustrating. To combat this I added retries using Polly.
Unfortunately adding retries didn’t completely fix the problem. It increased the total test time and also increased contention on the database at the point when it was most problematic, i.e. when the load was high. To constrain access to the database I implemented a rate limiter, again using Polly, with a maximum of 10 concurrent migrations and an unlimited queue size. Although this had the desired effect, it also meant that we’d effectively limited test parallelism to 10 — again driving up total test time.
The real solution is to perform the schema migrations more efficiently, and the only thing I could think of was to reduce the number of round-trips to the database. It turns out that we don’t really care about incremental migrations. In our tests we want a new database each time, so we can just create the schema in one go. To do that we have to generate the script and apply it:
await using AsyncServiceScope scope = _factory.Services.CreateAsyncScope();
SampleDbContext dbContext = scope.ServiceProvider.GetRequiredService<SampleDbContext>();
// this is using types normally hidden to application code
IRelationalDatabaseCreator databaseCreator = dbContext.Database.GetService<IRelationalDatabaseCreator>();
// needed for idempotency if retrying this method due to transient errors
await databaseCreator.EnsureDeletedAsync();
// creates database without schema
await databaseCreator.CreateAsync();
// script is not idempotent nor executed in a transaction
string script = dbContext.Database.GenerateCreateScript().Replace("GO", "");
await dbContext.Database.ExecuteSqlRawAsync(script);
Although it isn’t as simple as the original implementation it’s much faster. As it’s more efficient it also allowed us to remove the rate limiting and unleash test parallelism. The end result being: running ~150 tests went from ~5 minutes to ~2 minutes.