Rust impact on engineering management
In 2021, I’ve published two articles about Rust:
- Rust, first impressions
- Rust, second impression
Time has passed, and I now manage a small team of experts at a Cloud provider who are developing what could be the future of IAM.
It goes without saying that the vast majority of this service is developed in Rust.
Today, I’d like to share with you my feedback on the advantages (but also the disadvantages) of the language from the point of view of engineering management in a professional context.
Context, why Rust ?
Let’s start by setting the scene.
As indicated, my experience is set in the context of the development of a Cloud service, and more specifically the IAM of a Cloud provider.
IAM stands for Identity and Access Management.
This is a central and critical service for all Cloud providers, as it is systematically involved in every customer request.
It has two roles:
- Ensuring that the customer is who they claim to be (authentication),
- Decide whether the client is authorized to perform the requested action on the targeted resource, based on a more or less complex context and policies usually defined by its administrator (authorization).
And this, for a wide range of target services (virtual machine management, network management, block storage management, object storage management, object storage access, cryptographic key management, and all the services a cloud provider can offer).
It’s a highly solicited service: for each user request, for each resource (accessed, created or modified) and this, for all the Cloud provider’s services/products, then: performance is key.
It’s also an ultra-critical service in terms of availability: if the IAM fails, no service, no stored object, no resource can be accessed. It’s a total blackout, then: availability is key.
Finally, it’s THE service on which our customers’ security depends.
In a zero-trust architecture, authentication and authorization systems are the only frontiers between order and chaos, then: security is key.
Quite a few constraints.
Without going into too much detail, we’ve added a fourth: our IAM is based on a principle of distributed authorization management using Biscuit technology.
So it’s a distributed system, with all the added complexity that brings.
Considering the intersection of all these constraints, no other language seemed more suitable than Rust, quite simply.
Team structure
Designing and developing an IAM is an extremely difficult task.
This is not just a technical problem, but first and foremost a logical one: we need to design an authorization model.
To design an authorization model, we need to understand and model:
- How all Cloud services work,
- How clients access and use them,
- How client organizations are designed,
- How each service manages the isolation of resources between clients (tenancy management).
There’s no question of using a “traditional” approach based on traditional theories and product methodologies.
Not that they’re fundamentally wrong, but the boundary between functional and theoretical/technical constraints is virtually non-existent here.
We constantly need to go back and forth between the logic model (which requires technical skills to master) and the functional model (which requires putting ourselves in the customer’s shoes, whether internal or external).
The profiles involved in product development, from the analysis and understanding of customer needs to the drafting of specifications (define) and finally the implementation of these specifications, must be the same throughout the entire cycle.
The cost of transferring and synchronizing information between several subteams would simply be too high, and above all too risky (loss of information, misunderstanding of specific concepts, failure to take into account theoretical and/or technical limitations).
This means building a team made up of experts in the field, but versatile enough to be able to put themselves in the shoes of customers (internal and external), to dialogue with them, to question them, to understand them, while also being able to intuit the theoretical and software consequences in real time.
This team doesn’t have to be very big (and shouldn’t be too big). Keep in mind that the cost of communication/synchronization within a team evolves according to the formula c = n*(n-1)/2, with n, the number of peoples in the team.
So we need to find a few experts :
- senior Rust developers with this versatility level,
- a QA (Quality assurance) expert,
- a PM/PO who will guarantee that all aspects of service design are covered,
- a security / compliance expert (don’t forget that our service is critical on this point),
- a technical documentation expert (there’s going to be a lot to write),
- and of course a manager with a minimum understanding of the subject and constraints to coordinate all this.
In addition, we need a small operational team (SRE) to deploy the service and develop all the tools needed to monitor and operate it.
So we’re not too far from the surgical team described by Frederik P. Brooks Jr. in the classic “The mythical man-month”.
Recruitment
Recruiting Rust developers can be frightening for many companies.
They have the reputation of being rare and expensive.
True, but for good reason.
The market for Rust developers is growing slowly, the language isn’t quite 9 years old yet, but its undeniable qualities have earned it the attention of the big names in tech (see Microsoft, AWS, Google, Apple, just for examples).
As a result, as a company you’ll be competing with these giants in a relatively tight market. It’s intimidating.
However, despite the language’s reputation for a steep learning curve, the market is growing in all countries. Finding profiles isn’t mission impossible, as long as you pay the right price.
The relative difficulty of learning Rust and its positioning make it a second, third or tenth language for most developers.
The language they’ve learned out of necessity or (often) desire, after having already cut their teeth on more mainstream and, above all, more accessible languages.
As a result, the vast majority of Rust developers are senior profiles, often with several years’ experience behind them.
They’ll be more expensive, but they’ll save you a lot of time and money in training, coaching and management, not to mention the savings on the cost of running the programs they develop for you.
This seniority and relative scarcity has another advantage when it comes to recruitment: most of the available profiles are good ones.
Rust’s learning curve has done a part of the filtering work for you :-)
Product definition with Sr Rust dev
In many companies, the product definition phase takes place within the “product” team dedicated to this role.
They then pass on the “Epics” and “Stories” to the development teams for implementation.
When it comes to designing an ultra-technical service such as IAM, this approach is unrealistic.
Authentication and authorization management are not concepts with which people are generally comfortable.
We all use these concepts every day, but most of the time we don’t realize it.
To be able to discover and characterize customer needs on this type of subject, you need to be an expert in the field, knowing what questions to ask and how to direct your investigations.
The product expert must be so close to technical and theoretical constraints that it becomes difficult to distinguish him from an experimented developer.
The benefit of building a team of senior developers is that they generally have the right degree of versatility to go the other way, and become product experts.
Specification work then flows smoothly and naturally, because there’s no friction between two worlds that may sometimes find it hard to understand each other.
It’s also faster, because it requires fewer iterations.
The team is able to position itself on both sides of the mirror in real time, assessing both the customer benefit of a feature and its cost (in terms of development, performance, complexity, availability, security, etc.).
In my case, managing the product specification phase came (and continues to come) extremely naturally.
This is not directly linked to the fact that I work with Rust developers, but rather to the fact that Rust devs are generally senior devs, and that good senior devs know how to step outside their technical perimeter.
Fearless iteration
Once specifications have been drawn up (or part of them, as this can be an iterative process), your small team will move on to development.
We’ve opted for the tracer bullet strategy, where specifications are implemented progressively (feature set by feature set) but with production-level code quality.
This approach enables us to validate our theoretical model, and therefore our specifications, while gradually building up what will become the final product, unlike a POC destined to be discarded (even though some companies put them into production) after the demonstration/experimentation.
The role of Rust within this iterative strategy is to enable you to progressively build the service, implement functional adaptations and refactor without fear of degradation or “erosion” of code quality.
The type-safe qualities enabled by the language’s algebraic data type capabilities allow you to create a relatively high-level abstract representation of the functional domain and evolve it without the risk of breaking anything in the overall code.
Whether at the level of business logic or data management (strongly typed), whether in memory or during the persistence phase (tools like SQLX will make sure at compile time that your database data model matches your code).
The control mechanisms implemented in the Rust compiler (and consequently in many libraries / crates in the ecosystem), allow you to evolve your product in complete safety.
Very few languages offer this guarantee.
I mentioned in the first chapter that our service had to be available, efficient and secure.
Here again, Rust plays an important role.
I won’t go back over these performance qualities, as the web is full of studies and benchmarks on the subject. The fact is, to put it briefly, that it enables us to achieve just about the best possible level of performance, all languages considered.
In terms of security, the memory management model solves 70% of potential vulnerabilities “for free”.
In terms of availability, using a compiled, statically typed language with a memory management model like Rust’s definitely takes you away from a huge number of sources of runtime crashes (casting problems, mutex problems, stack overflow problems, pointer problems, deadlock problems, etc.).
It won’t magically provide resilience solutions (survival of infrastructure failures, server, network, storage, etc.), this part will have to be dealt with through a carefully designed architecture.
CI/CD
The Rust ecosystem has reached a level of maturity that will enable you to set up satisfactory build chains.
In addition to integrated unit testing and benchmarking mechanisms (with cargo bench or tools like Criterion), there are tools:
- For checking transitive dependencies (presence of CVEs or license verification),
- For generating SBOM (Sofware Bill of Material), which have become a legal requirement in some countries, and a good practice in general. By recording and scanning them periodically, you’ll be alerted if a CVE appears on a dependency (even a transitive one) after your program has been compiled (and deployed),
- For managing cross-compilation (if your target OS / CPU is not the one used in CI pipeline),
- For packaging (rpm, deb package management),
- For publication (on public or private registries).
You have everything you need to test, verify, build and publish binaries of the highest quality.
In my case, we deploy our programs as containers.
We systematically compile our programs using the linux-musl target (static compilation), with a performant memory allocator (checkout this article to understand why).
This allows us to build Docker images that contain ONLY the binary (`FROM scratch`), reducing not only their size, but also and above all the surface of attack.
QA
Production services cannot be deployed without a serious QA (Quality Assurance) phase.
Over the past few years, my observations in the field have confirmed my intuition.
For equivalent quality, the cost of a QA (which translates into the number and type of tests developed) for a component developed using a non-strongly typed language (node JS, Python, etc.) was indisputably higher than for components developed using a strongly typed language (Go, Java, Kotlin, Scala, C++ …).
Rust further reduces this cost, as the sum of natively executed compile-time checks, plus those that can be added by the developer, makes a whole category of QA tests quite simply unnecessary.
Of course, you’ll still need to test functional accuracy, non-regression, performance and resilience. But many of the so-called “technical” tests will no longer be necessary.
The small extra cost paid at the time of development by choosing Rust is reimbursed by your QA costs.
Team management
As indicated in the “team structure” chapter, by choosing a challenging language, you condemn yourself to recruiting fewer experienced developers.
There are two enormous advantages to this situation:
- You can keep the team to a size that allows you to be individually effective as a manager (between 6 and 8 people).
- The seniority of most of the engineers on the team will save you an enormous amount of time and energy.
You need to focus mainly on three thinks :
- Defining and maintaining the clarity of the product/mission objective (it will inevitably evolve over time),
- Keeping your team at a high level of quality and expectation,
- Levelling the road ahead of your team, so that it can move forward undisturbed by all the inevitable hassles and irritants of big companies.
The last point is crucial.
Your small surgical team is made up of experts in their fields.
They come with their convictions, forged and tested by many years of experience.
It must be able, to a certain extent, to define its work processes and choose the tools with which it will work.
Once these conditions are met, progress should be relatively rapid. Your role as manager is to protect her from start-and-stop situations potentially generated by the company’s processes or her dependence on other teams.
You have to get ahead of them, anticipate internal difficulties and, as far as possible, resolve them before they join you.
They’ll have their work cut out for them with the difficulties inherent in the product and the customers.
Financial considerations
Let’s end with a point that’s further removed from engineering management, but nonetheless important.
As I wrote, recruiting senior profiles comes at a cost.
To attract them in the face of stiff competition, you need to assess their market value and pay them accordingly.
Some companies have a very quantitative view of their engineering teams, and some human resources teams are incentivized to control their wage bill (especially when recruiting).
The vision I’ve just described in this article is therefore not naturally and universally understood by many companies.
You’ll have to defend it, explaining that a team of 6 highly-paid engineers will do the same work as an average team of 40, and that in the end, the first scenario will benefit the company threefold:
- It will cost less in direct costs (payroll),
- It will cost less in indirect costs (management, support services, equipment, etc.),
- It will certainly lead to a better-quality product or service (which will consequently be more profitable for the company).
Choosing Rust and all its implications for the development of your service therefore makes financial sense.
Conclusion
I don’t receive any royalties every time a company or team chooses Rust.
I’ve shared my modest experience with you objectively.
Rust isn’t a magic technology, and choosing it won’t guarantee the success of your product or service.
I simply think, as I’ve explained (at somewhat length), that this decision is linked to a certain number of positive side effects from an engineering management point of view.
It’s probably not specific to this technology, and the same conclusions could be applied to other, more demanding languages.
Many companies choose their technical stacks according to the opposite criteria: the availability of profiles on the market, their average costs, …
They often pay for this in indirect costs: overly rapid growth in team size, unstable or poor-quality products, management difficulties, high staff turnover and, as a result, erosion of internal product knowledge and quality.
For many, the choice of a demanding technical stack is counter-intuitive.
I hope this article has enlightened you as to why you should consider it.