Reposcanner

Written by James Thomas

August 2, 2017

Reposcanner is a Python script designed to scan Git repositories looking for interesting strings, such as API keys or hard-coded passwords, inspired by truffleHog. Sensitive information like this often gets included in the earlier stages of the development process (or accidentally), and is generally removed before the application or source code is released. However, since Git keeps a history of all changes, by going back through these commits, we can scan back through the commit history to obtain information that has been removed in the latest version. The basic flow of reposcanner is as follows:

Try and clone the repository if a remote URL is given
Get the active branch (or all branches if the -a option is given)
Create a diff for each commit in the select branch(es)
Ignore any known boring string patterns and files names/extensions
Extract any long hexadecimal or base64 strings
Calculate the entropy of these strings
If the entropy is high enough, and it’s not been seen before, store the string
Output all strings that are found

The hardest step is trying to identify “interesting” strings, without ending up with too many false positives. Reposcanner has some known patterns of boring strings which it ignores, and you can tweak the minimum entropy to report if you’re getting too many false positive (the current value was obtained through some trial and error). A possible future option might be to also search for interesting strings (such as “api_key = foo”, or connection strings). Unlike truffleHog, which shows the entire diffs to give context, reposcanner has a much more concise output, which only shows you the relevant line, along with the commit information so that you can go and examine the commit yourself if the string looks interesting. This makes the output much more manageable, especially when scanning larger repositories. Scanning some randomly selected repositories in GitHub resulted in the expected interesting strings, including:

API keys third party services, in an employee financial bonus scheme
Application and database passwords
A SQL database backup

This is a serious risk for companies when internally developed projects are released to the public – developers are less likely to be careful with their commits to an internal project compared to a publicly available one, and this increases the likelihood of inappropriate files making their way into the version control system. It can also reflect badly on a company if you have unprofessional code, comments or commit messages – a message like “accidentally deleted database” doesn’t tend to inspire confidence.

Going through the commit history and trying to sanitise is likely to be unfeasible, unless it’s a trivially sized repository, so the approach that most organisations take is just to completely wipe the commit history – either by creating a fresh repo and copying the files into it, or destroying the entire history with a rebase. While this provides a degree of protection from inappropriate commit messages or data being leaked, it does also destroy the development history of the repository, which is very valuable to developers when trying to fix bugs, or to understand why certain decisions have been made in the development process. As always, it’s the trade-off between security and convenience.

Besides destroying the repo history, the best thing that you can do to protect against these issues to have secure development practices from the start, even for projects that you’re never anticipating releasing. This should include making sure that sensitive information is never committed into source control, and of course, trying to keep comments and commit messages (reasonably) professional.

The Reposcanner code is available on the Dionach GitHub at https://github.com/Dionach/reposcanner – pull request are welcome as always.

Like what you see? Share with a friend.

Explore Our Services

Let’s Explore How We Can Support Your Cybersecurity Journey

Get in touch with our team today to find out how we can help you.

Reposcanner

Explore Our Services

AI Capability & Awareness

AI Security & Operational Resilience

AI Risk Management & Compliance

AI Strategy & Governance Advisory

AI Cyber Security Governance Services

AI Application Penetration Testing

Let’s Explore How We Can Support Your Cybersecurity Journey

Discover Our Latest Research

Changes in the SWIFT CSCF 2025: What You Need to Know

ISO 27001:2022 Deadline: What You Need to Know Before October 2025

Gambling Commission ISO 27001

The Company

Services

Stay up to date

Contact Us Reach out to one of our cyber experts and we will arrange a call