Wes Dean
Overview
One of the Flexion Fundamentals is to “never stop learning” and, you know, sometimes it’s just easier to see an example of how to do something, play around with it for a bit, figure out how to adapt experiences into usable knowledge. Sometimes, however, we need to protect some of our work (e.g., for contractual reasons) from wide-open, public access by the entire world.
GitHub allows repository managers to delegate access to individual users or to teams, but not organizations. Teams are subsets of organizations. So, to work around this, we came up with the idea of creating a “team” whose membership consisted of the membership of the organization. Therefore, by granting access to a team, we could grant access to the entire organization. GitHub doesn’t provide an out-of-the-box solution to creating a team containing all members of an organization. We were so close! GitHub Enterprise Cloud (GHEC) gives us the ability to programmatically maintain team membership by connecting an organizational account with an Identity Management (IdM) / Identity Provider (Id) service. That wasn’t a road we were prepared to explore, so we looked at some other options. After some more research, we determined GitHub gave us all of the tools we needed to enable this access control model – we just needed to glue the pieces together.
GitHub Organization to Team Synchronization Tool
We have several tools that automate processes for us – including some that interact with GitHub – that were written in the Python programming language. Python is an open-source, high-level, general-purpose programming language that’s been around since the early 1990s. While used for many, many different purposes, one niche where Python shines is in systems engineering. So, we opted to focus our efforts on using Python to build a tool to meet our needs.
Similarly, we’ve used PyGitHub which is a Python library to access the GitHub REST API. That is, we could use our Python knowledge and experience to interact with GitHub and script our interactions with repositories, users, organizations, teams, etc.
So, that settled it. PyGitHub for the win. The tool – a fairly straightforward Python script – would run three high-level steps:
- Figure out who’s in an organization
- Figure out who’s on a team
- Synchronize the team with the organization
Step 1: Organizational Membership
The PyGitHub library provides a very convenient way to determine who’s in an organization. Instantiate an object to represent the organization and use the get_members() method to get its members.
Step 2: Team Membership
Surprise! We instantiated an object to represent the team and used the exact same method to get its members. Remember how we mentioned that teams are subsets of organizations? Well, one nice benefit of that is when a user becomes disassociated with an organization, they’re automatically no longer a member of any teams within that organization.
Step 2.5: Filtering Team Membership
Okay, we added a little, teeny-tiny bit of complexity here. We got to thinking, “Are there any use cases where we wouldn’t want all members of an organization to become members of a team? What if we only wanted organizational members whose names started with a ‘W’?” So, we added the ability to filter users based on fields associated with their accounts (i.e., not just their username (“login”). We could provide rules (regular expressions) that would be applied to user records and, depending on whether or not they matched, they could be included or excluded from the team.
This was most handy during testing and initial roll-out as it allowed us to send over individual users, then small batches of users (e.g., users with logins that started with A – E), and then all users in total. One concern we had was to limit the potential damage of something going wrong disabling developer access while another concern was overloading the API with too many team membership additions when the synchronization was first run. It turned out that neither of these turned out to be actual problems, but we wanted to be prepared just in case.
Step 3: Apply the Differences
This, too, was pretty straightforward. Iterate through all of the members of the organization; if a user was in the organization but not the team, add them to the team; if a user was not supposed to be in the team (e.g., because of a filter), remove them from the team.
Extra Features
While developing the tool, we added a few features that were useful for debugging. First, we added the ability to do a “dry run” which is where the tool would tell you what it would do but not actually do anything that resulted in any changes. So, we could see who would be added, who would be removed, who wouldn’t be changed, etc. without any risk of anything bad happening. Second, we added configurable diagnostic logging. We use Python’s logging facility to mark some messages as “info” priority, some messages as “debugging” priority, etc. while granting the operator the ability to set the threshold for what messages are displayed. Finally, we added the ability to configure the tool using environment variables. We then used Python’s dotenv module to support reading “.env” files from the filesystem (handy when testing the tool locally) while seamlessly allowing us to use os.getenv() to extract values from the environment or the file.
This was pretty helpful for us because not only could we run tests locally and not have to set a bunch of variables at runtime, but it allowed the tool to run in a containerized environment (e.g., under Docker). Our goal was to make the tool compatible with 12-factor Application Development practices.
Running the Tool
Writing the tool was fun and educational. However, we wanted to take it to the next level and actually use it to do the thing it was designed to do. Who knew?!
We didn’t want the tool to run on a developer’s terminal and we certainly didn’t want to stand up and maintain dedicated infrastructure. We just needed a Python script to run every few hours. We thought about running a container on AWS ECS or possibly running a Lambda, but we decided to use an even lighter-weight option: GitHub Actions. We configured a separate repository that would run with a scheduled trigger (every 4 hours in our case), pull down a Docker image of the tool, initialize environment variables with the needed configuration, and then run the container. Not only did we avoid any dedicated infrastructure or any single person being the only person who could access the thing, but everything the tool did and how it did it was logged for future reference. We did have to add a GitHub Action workflow to build and publish an image, but that’s something we already had in our toolbox.
The Results
Since we built and deployed the tool, we now have a GitHub Team that includes all of the members of our GitHub organization. This team – the “everyone” team – is updated automatically every few hours; when someone dissociates from the organization, they’re automatically removed from the team. The primary result of this is that we can now “inner source” repositories on GitHub. That is, we can create private repositories and add the “everyone” team to allow everyone in the organization to access those repositories. This greatly simplifies access control for us as we no longer have individual team maintainers continually asking, “what’s your GitHub ID?” to share guild resources or internal tools. All that’s required is for repository administrators to add the “everyone” team as a collaborator.
The tool’s resources include a repository and a runner.
Allow Flexion to guide you in automating and optimizing team management, so you can focus on what matters most—building and maintaining resilient systems. Contact us today!
Wes Dean, a Senior DevSecOps Engineer at Flexion, brings his extensive experience in the UNIX and Linux world since the early 1990s to his role. He supports a variety of U.S. Federal agencies, helping them work safer, faster, more efficiently, and more securely. Wes’s unique position as a member of the CMS Open Source Program Office Advisory Board’s CMS Source Code Stewardship Taskforce underscores his expertise and credibility. He is also a staunch supporter of MegaLinter and a contributor to the tool’s prose scanning functionality, among other improvements.