Megalinter performance tuning for maximum efficiency

Megalinter performance tuning for maximum efficiency

Megalinter performance tuning for maximum efficiency https://flexion.us/wp-content/uploads/Fine-Tuning.png 1501 751 Flexion Flexion //flexion.us/wp-content/uploads/Flexion_Logo_Tagline_Horizontal_Reversed-1.png November 13, 2024 February 12, 2025

Wes Dean

Overview

In the first two parts of my MegaLinter series, we explored the basics of getting started with MegaLinter and dove into advanced tips and tricks that help streamline code quality management. We will now crank it up a notch and learn how to fine-tune it further.

My team at Flexion was on a project that involved a SaaS solution. The project used MegaLinter to help keep the code safe and the quality high. One of our Flexion Fundamentals is to “Never Compromise on Quality” and MegaLinter helps us stay keenly focused on delivering high-quality code. If something looks suspect or we do something that isn’t aligned with a best practice, we want to know as soon as possible so we can fix it as soon as possible.

The team was requested to focus primarily on cost-cutting, specifically surrounding the amount of billable time GitHub Actions was using. The organization’s sizable monthly quota was being eaten up before the end of the month so the team was asked to take a look for ways to cut back on the amount of billable time spent running GitHub Actions. The other cool thing about keeping MegaLinter runtime down is that our developers aren’t sitting around waiting for MegaLinter to scan their code. Compiling, on the other hand, isn’t something we looked at for this exercise.

Methodology

First, we looked at the Elapsed Time column as reported by the GitHub Comment reporter. The linters that took longer to run received the most attention. Linters that only took a second or two received no attention. Then, we looked at the number under the Found column. If there were hundreds or thousands of findings consistently across many runs and the number didn’t go down, we quickly concluded that the results of the scanner weren’t being used and the linter could likely be disabled. This wasn’t intended as a commentary on the quality of the work or the quality of the scanner — we took the pragmatic perspective of what yielded the most benefit for the least cost (so, a Return on Investment decision).

Changes

These are a few of the changes we made to bring the costs down.

Switch the trigger from push to pull_request

Initially, the team had MegaLinter trigger when developers pushed code up to the repo. This was helpful in keeping the code in compliance with their coding style guidelines; however, it meant that every time anyone did anything, MegaLinter ran. We changed it to only run when a Pull Request is generated to merge branches into main or master or when manually initiated.

This change was made in the project’s .github/workflows/megalinter.yml file:

--- name: "MegaLinter" # yamllint disable-line rule: truthy on: pull_request: branches: - main - master workflow_dispatch:

This was a huge reduction in the number of times MegaLinter ran, especially when APPLY_FIXES was set. We also found that having APPLY_FIXES set meant that every time a linter fixed something, the developer would have to pull from the repo after MegaLinter finished up in order to pick up the most recent changes; when they didn’t they would receive messages saying that their (local) branch was out of date.

Switch to a smaller flavor

The team had been using a flavor that only ran the linters they actually needed to run. The full v7.12.0 image is about 3.53 GB compressed / 9.6 GB uncompressed while the flavor they had been using included 4 linters that, while relevant to the project, weren’t being used. As the project hadn’t been using MegaLinter since the start (i.e., it was adopted after several years of development), there were a bunch (!!!) of linter findings that the team had no intention of addressing. The findings were reasonable and correct, but they just weren’t relevant to the project in its current state. The ci_light flavor included the tools they wanted to use (e.g., GitLeaks, Grype, Secretlint, Trivy, and TruffleHog, among others). We contemplated using the security flavor (1.02 GB) but opted not to go in that direction as there was no IaC in the repo, so there was no need to run KICS, Checkov, tflint, etc.

As a result, going from the flavor they had been using down to the ci_light flavor cut the size of the image being pulled from 3.53 GB compressed down to 0.46 GB compressed. This change was made in the project’s .github/workflows/megalinter.yml file:

- name: MegaLinter id: ml uses: ghcr.io/oxsecurity/megalinter-ci_light:v7.12.0
That is, they eliminated the tools they weren’t using and cut the size of the image down by eighty-seven percent.

Tell GitLeaks to only scan the current commit

GitLeaks is used to detect secrets (credentials, tokens, API keys, passwords, etc.) stored in files in the repository. Generally speaking — and this is just my personal opinion — it’s usually not great to store secrets in the source code for an application.

By default, GitLeaks detects whether the stuff being scanned is a Git project (generally a safe assumption given that it was running as a GitHub Action and had a .git/ directory). As a result, it’ll scan the repository and its entire history for secrets.

Once we established that there were no secrets in the history of the repository, we made the decision to accept the risk of only having MegaLinter scan the commits it was requested to scan and not the entire history. We judged that the risk was acceptable given that the project was closed-source, only signed commits were accepted, and the main branch required approved PRs before other branches could be moved in. This tweak cut down GitLeaks runtime from 50 seconds down to 4 seconds. To implement this decision, we configured MegaLinter to pass the –no-git flag to Git Leaks in the project’s .mega-linter.yml file:

# only scan the files in This commit, not the entire history of the repo REPOSITORY_GITLEAKS_ARGUMENTS: "--no-git"

Only scan updated files

The team’s concern with only scanning updated files was wanting to have security-related tooling to run on all the files all of the time so that as the tooling improved and was able to detect more potentially problematic situations, not just on updated files.
The security-related scanners we were using were generally in the REPOSITORY_* descriptor (group). Scanning the documentation for these linters showed that the ones we were using typically included the following notation: How are identified applicable files
If this linter is active, all files will always be linted

That is, even if VALIDATE_ALL_CODEBASE was set to false, the security linters would still run. The team decided that this was acceptable and updated the .mega-linter.yml file like this:

# only scan updated files VALIDATE_ALL_CODEBASE: false

Other tweaks

We made some other tweaks, such as disabling Trivy-SBOM (we weren’t building anything that would consume an SBOM), and limiting the scope of the formatting linters (jsonlint, v8r, prettier, etc.). However, these changes did not yield a noticeable improvement.

Overall

We’ve made remarkable optimizations to MegaLinter, enhancing both speed and efficiency. Here are the key improvements we’ve achieved:

Reduced the number of MegaLinter runs on a typical day when that repository being updated from 9 =>6 runs (33% improvement)
Cut image download size from 3.53 GB => 0.46 GB (87% improvement)
Cut runtime from 6:58 => 2:33 (63% improvement)
Average daily runtime 1:02:42 => 0:15:18 (76% improvement)
Average monthly billable minutes 1,276 => 304 (76% improvement)

That is, we were able to cut way down on the amount of time spent running MegaLinter while still getting all of the value we’ve come to love, appreciate, and expect over the years. This keeps all of us happy while reducing developer downtime billable GitHub Action minutes.

Looking to save time and maximize efficiency in your development projects? Get in touch with the Flexion team today!

Wes Dean, a Senior DevSecOps Engineer at Flexion, brings his extensive experience in the UNIX and Linux world since the early 1990s to his role. He supports a variety of U.S. Federal agencies, helping them work safer, faster, more efficiently, and more securely. Wes’s unique position as a member of the CMS Open Source Program Office Advisory Board’s CMS Source Code Stewardship Taskforce underscores his expertise and credibility. He is also a staunch supporter of MegaLinter and a contributor to the tool’s prose scanning functionality, among other improvements.

Megalinter performance tuning for maximum efficiency