Wes Dean
Introduction
My team at Flexion became privy to a situation where several secrets (credentials, API tokens, etc.) were committed to a repository. One of our Flexion Fundamentals is to “be skeptical and curious. We don’t take things at face value.” Part of that skepticism is looking for potential security vulnerabilities and concerns with tools that we’re considering adopting or supporting. So, we ran MegaLinter on the repository and happened to detect these secrets that, while no longer active, were artifacts from previous development iterations. We were tasked with looking into the situation and providing advice and guidance about how to prevent situations like these from happening again.
In addition to running MegaLinter whenever Pull Requests (PRs) were submitted, we recommended a layer of protection before commits were created or pushed upstream. The pre-commit tool allows us to configure steps to run before git commits are created. These steps use git’s hooks functionality to invoke tools to protect our developers:
Our Solution
From an administrative perspective, we didn’t — and couldn’t — have a mechanism to determine if an individual developer did or did not use pre-commit or any other git hooks. Similarly, we couldn’t force individual developers to install or configure pre-commit on a system or repository basis. What we did know was that if a repository did not have a pre-commit configuration, we know that they do not have pre-commit protection in place. So, we wrote this tool to scan through all of the repositories in a GitHub organization and attempt to retrieve their pre-commit configuration files. For each repository, if there is no pre-commit configuration file, the tool will create an issue in the repository. If there is a pre-commit file but that file is empty or invalid or doesn’t have the elements pre-commit requires, we create an issue in the repository.
The Tool
Many of the tools we use are written in Python. We like Python because of how its expressive syntax combined with extensive package libraries allow developers to pick up and understand our tools, make adjustments, and keep things running smoothly. In other words, our top priorities were readability and, by extension, maintainability. Along similar lines, we prioritized inline documentation written in plain and direct language. We target roughly half to two-thirds of each source code file being comments and documentation. Half of it’s comments? Yeah. Fifty to seventy percent on average.
Folks, I don’t know about you, but with all of the things I’ve going on, I don’t remember what I had for lunch yesterday (it was actually the largest soft pretzel I had ever seen in my life, but that was an anomalous event) much less why I wrote a class or a method or a function the way I did six months down the road. We write as if we’re teaching the code to someone with no experience or familiarity with the code because we quickly become inexperienced and unfamiliar with a tool way, way too quickly.
There’s more to code quality than comments
Yup. Lots more. We use tools like MegaLinter to run a battery of checks and scans against our code every time we make changes. These tools include security scans but also code quality checks, linters, and style enforcement tools like pyright, pylint, black, and ruff. We use testing frameworks like pytest to make sure our stuff does what our stuff is supposed to do. Our dependencies are monitored and kept up to date using Dependabot and Renovate which create GitHub pull requests for us regularly. We also strongly believe in transparency and clearly communicating what’s under the hood to those who use, consume, extend, and/or depend on our tooling. That’s why we publish Software Bills of Materials (SBOMs) every time changes are made to the tooling.
How it’s deployed
We reviewed a variety of deployment techniques and made the bold decision to not require a single mechanism to run the tool. Want to run it manually or as a cron job on your laptop? Cool. Run it in AWS Lambda? Sure. GitHub Action? Done. Jenkins Pipeline? Not a problem. Scheduled task on AWS ECS? Covered.
In addition to publishing the tool’s source code, we also provided the framework to containerize it to run with Docker. With that in mind, it was important to remember that running containers may not have access to a filesystem or persistent/durable storage – the containers could be ephemeral and immutable. Even though the tool wasn’t a persistent application responding to service requests, we opted to use the Twelve-Factor App methodology. We even published images to DockerHub and GHCR so it’s only ever a Docker pull away. When the main branch of the repository is updated, images with the : edge tag are published; when a release is cut, images with the :latest as well as version-specific tags are automatically published. This whole process is automated, so bug fixes, dependency updates, etc. are available seconds after the source code is updated and the tests pass.
How it’s configured
We weren’t sure how folks would deploy the tool, so opted for an environment variable-centric approach. For example, one of the required inputs is a GitHub Personal Access Token (PAT) – a credential used to interact with GitHub without sharing passwords – so the tool looks for an environment variable named PAT. We also support .env files which are, essentially, lists of environment variables and their values.
When it runs
When the tool runs, it takes a PAT and the name of a GitHub organization, queries GitHub for all of the repositories under that organization, and then checks each repository individually. When it finds a repository without a valid pre-commit configuration file, it creates a GitHub issue in that repository. The issue describes what it found, provides a list of commands to run to fix it, and a sample pre-commit configuration file that can be copied and pasted directly into the repository. The goal was to make it as quick and painless as possible to adopt tools and processes to make development as smooth as possible.
Summary
At Flexion, we take security very seriously. Our fundamental of “be skeptical and curious” is exemplified by our taking a good, hard look at tools, source code, configuration files, etc. – drafted internally or externally – and making sure that they align with our best practices. We strongly encourage teams to incorporate tooling to reduce the probability of a secret being included in a repository. By advocating for tools like pre-commit and helping to make the adoption of those tools as painless as possible, we help make our work and the work of those with whom we collaborate safer and more secure. Want to collaborate with team Flexion on your next project? Drop us a line!
Wes Dean, a Senior DevSecOps Engineer at Flexion, brings his extensive experience in the UNIX and Linux world since the early 1990s to his role. He supports a variety of U.S. Federal agencies, helping them work safer, faster, more efficiently, and more securely. Wes’s unique position as a member of the CMS Open Source Program Office Advisory Board’s CMS Source Code Stewardship Taskforce underscores his expertise and credibility. He is also a staunch supporter of MegaLinter and a contributor to the tool’s prose scanning functionality, among other improvements.