Connect with us

Security

Exposed Secrets: The Security Risks of Public GitLab Repositories

Published

on

The Revelation of Exposed Secrets in GitLab Repositories

Following an extensive examination of 5.6 million public repositories on GitLab Cloud, a security expert unveiled a staggering number of over 17,000 exposed secrets spanning across more than 2,800 distinct domains.

Utilizing the TruffleHog open-source tool, Luke Marshall meticulously scrutinized the code within the repositories to pinpoint sensitive credentials such as API keys, passwords, and tokens.

Prior to this, Marshall conducted a similar scan on Bitbucket, where he uncovered 6,212 secrets scattered across 2.6 million repositories. Additionally, his investigation into the Common Crawl dataset, used for AI model training, revealed 12,000 valid secrets.

Wiz

GitLab stands as a web-based Git platform popular among software developers, maintainers, and DevOps teams for code hosting, CI/CD operations, collaborative development, and repository management.

Marshall leveraged a GitLab public API endpoint to enumerate each public GitLab Cloud repository. Employing a custom Python script for pagination, he meticulously sorted the results by project ID, yielding a list of 5.6 million non-duplicate repositories.

Subsequently, the names of these repositories were dispatched to an AWS Simple Queue Service (SQS), from where an AWS Lambda function extracted the repository name, ran TruffleHog against it, and documented the outcomes.

Describing the process, Marshall mentioned, “Each Lambda invocation executed a simple TruffleHog scan command with concurrency set to 1000.” This setup enabled him to complete the scan of 5.6 million repositories in just over 24 hours.

The total expenditure for scanning the entire public GitLab Cloud repositories via this method amounted to $770.

The investigation unearthed 17,430 verified live secrets, nearly triple the count discovered in Bitbucket, with a 35% higher secret density (secrets per repository) as well.

See also  Securing Data Transfers During Hypervisor Migration

Analysis of historical data indicated that most leaked secrets originated post-2018. Surprisingly, Marshall identified some ancient secrets dating back to 2009 that remain valid even today.

Volume of exposed secrets
Volume of exposed secrets
Source: Truffle Security

The majority of leaked secrets, exceeding 5,200, were Google Cloud Platform (GCP) credentials, followed by MongoDB keys, Telegram bot tokens, and OpenAI keys.

Additionally, Marshall identified a little over 400 GitLab keys leaked within the scanned repositories.

Types of exposed secrets on GitLab
Types of exposed secrets on GitLab
Source: Truffle Security

In compliance with responsible disclosure practices and considering the 2,804 unique domains associated with the discovered secrets, Marshall automated the notification process to inform the affected parties. He utilized Claude Sonnet 3.7 alongside a Python script with web search capabilities to generate emails.

Throughout this endeavor, the researcher received multiple bug bounties totaling $9,000.

While many organizations promptly revoked their exposed secrets upon receiving notifications, a certain undisclosed number of secrets remain vulnerable on GitLab.

Wiz

As budget season approaches, over 300 CISOs and security leaders have shared their plans, spending insights, and priorities for the upcoming year. This comprehensive report compiles their perspectives, enabling readers to benchmark strategies, identify emerging trends, and align their priorities as they gear up for 2026.

Discover how top leaders are translating investments into tangible impacts.

Trending