At Clear Street, we build products and experiences that move billions of dollars every day. Under the weight of that responsibility, testing is a priority. Collaborating with the business and using a shared language when defining the tests helps us reduce risk and increase confidence in shipping quality code. BDD and E2E testing are the weapons of choice for us.
BDD (behavior-driven development) is a methodology that helps developers and business stakeholders collaborate by building a shared language for product specification, where the specification itself is testable code.
E2E testing (or end-to-end testing) relates to integration tests that are designed to test the system as whole, as opposed to unit tests, which test a single unit in isolation, typically a class or small set of code files.
There are many different flavors of BDD and E2E testing. In this blog, we’ll cover where this testing meets Clear Street, how we use these methodologies, the tooling we built around them, where they help us and where they don’t, challenges, and where we go next.
Let’s rewind 3 years to the year 2020. That was a fun year, right? The birds were chirping, and we were all stuck inside. Also that year, management decided that it was time to replace our stable-but-also-dying-inside transaction engine (named bank), with a new-and-shiny-architected-for-scale engine (named bk).
💡 This was a bold, game-changing decision by management (perhaps all that indoor time was getting to them). We’ve discussed this migration in detail, see Part 1, Part 2 & Part 3.
The old bank engine was built by our co-founder and Chief Technology Officer Sachin Kumar in the very early beginnings of the firm. The new bk engine was to be built by a new team who lacked both an understanding of the existing engine and limited prime brokerage knowledge in general.
They started by reverse-engineering the old engine, speaking with the various business experts at the firm, writing unit tests and manually testing the new engine, but this wasn’t very fruitful.
Progress was extremely slow and bugs kept creeping up (the legend goes that the entire team went completely bald from all the hair pulling).
The team lead, Christian, decided enough was enough and started working on a solution: writing the product specification as tests in a language that both the business and engineering teams understand. He wrote a tool called drone to run those tests against the entire system.
That specification was written by engineers and the business together and was tested against both the old and new system. It proved invaluable to getting the new system shipped with great quality.
drone uses a Python library called behave to run tests written in a natural language style (written using the Gherkin syntax). The tests are grouped into features, where each feature has several scenarios that define how the feature behaves.
Here’s an example of one of our scenarios:
The scenario is composed from a series of steps, where each step can be one of:
The scenario in the example sets up an instrument (a stock in this case) and an account in the system, then buys 5 shares for that stock in the account. It next tests that the trade was created and ledgers were updated correctly. Then, it rolls over the entire system 2 days forward and tests that the trade was settled and ledgers were updated correctly again.
Each of these steps is matched to actual Python code that talks to various components in the system to make this work behind the scenes. For example, the step to create an account is defined like this:
behave searches all of the step definitions to find a step that matches the text, then parses the arguments (with the help of custom defined parsers we defined for special types) and calls this function, which is standard Python code.
We were running this manually from the command-line for some time before we had the time to improve on our testing story. We wanted those tests to run automatically each day or with a button click on a specific pull request (we didn’t want to run automatically on a pull request, more on that later), so that we’ll see breaks early on and fix them quickly, fastening our feedback loop, and increasing confidence in our releases.
To build, we relied on Ivy, a service to deploy an ephemeral development environment with local kubernetes, which was developed by our wonderful infrastructures team.
💡 Read more about Ivy in this blog post.
We deploy our main development branch for the daily tests, or a specific branch for testing a pull request. We start all of the services we need to run the tests, including our drone server. We then send a request to the drone server to run all scenarios which are tagged with the service we’re testing (the example scenario from above is tagged as bk.business_requirements, meaning it’s a scenario that’s part of the bk test suite).
We started with just bk, but quickly we wanted to add this to other services which our team owns and also wanted to increase adoption with other teams in the firm.
So, we built three Python libraries to help us making setting up e2e tests for a service a pleasant experience:
Here’s how we use libe2e to run the e2e tests for bk:
Notable details on the configuration include:
Here’s what a successful e2e run looks like on Slack (in this example, for our obligations service):
And here’s a bad one to compare:
We have an on-call person designated to look into any failures that arise and fix them. And for our more critical services like bk, we do not deploy to production unless we have a successful e2e run for the artifact we want to deploy.
Following our continuous integration setup, we wanted to provide a UI to make the experience even better. This aimed to help with both running the tests locally (using CLI or the API was cumbersome), and tracking the test results, both locally and for CI failures.
Here’s how the UI looks before we run anything:
We select scenarios to run using the left panels, either by using the “Features” view or the “Tags” view. Here we chose all of the scenarios for the custody team and a single scenario from the bk.business_requirements tag. We can see this results in 95 scenarios in the top button which we can click to run the scenarios.
Once we run a set of scenarios, we’ll see a session in the top left panel which we can click to track the test live as it runs:
If we have an error we can drill down into the scenario to see which step failed, and with what error:
For tests that run using continuous integration, if there’s a failure we keep the environment alive, and you can land into drone UI straight from the slack link to troubleshoot the error.
Not everything is rosy, of course. Here are some the challenges we are facing, and maybe this is not a perfect fit for all use cases:
We are far from done, and there are a lot of avenues to improve in the future. Here’s a partial list:
Using BDD and E2E tests greatly help us ship quality code. Benefits include:
All in all, we highly recommend using BDD to create E2E tests defined with a shared language for projects where correctness is critical and the business language is complicated.
We are far from done, but very proud of what we achieved so far.
Clear Street does not provide investment, legal, regulatory, tax, or compliance advice. Consult professionals in these fields to address your specific circumstances. These materials are: (i) solely an overview of Clear Street’s products and services; (ii) provided for informational purposes only; and (iii) subject to change without notice or obligation to replace any information contained therein.
Products and services offered by Clear Street LLC, member FINRA and SIPC. Additional information about Clear Street is available on FINRA BrokerCheck, including its Customer Relationship Summary.
Copyright © 2023 Clear Street LLC. All rights reserved. Clear Street and the Shield Logo are Registered Trademarks of Clear Street LLC