Reposcanner

Written by Dionach by Nomios

August 2, 2017

Reposcanner is a Python script designed to scan Git repositories looking for interesting strings, such as API keys or hard-coded passwords, inspired by truffleHog. Sensitive information like this often gets included in the earlier stages of the development process (or accidentally), and is generally removed before the application or source code is released. However, since Git keeps a history of all changes, by going back through these commits, we can scan back through the commit history to obtain information that has been removed in the latest version. The basic flow of reposcanner is as follows:

Try and clone the repository if a remote URL is given
Get the active branch (or all branches if the -a option is given)
Create a diff for each commit in the select branch(es)
Ignore any known boring string patterns and files names/extensions
Extract any long hexadecimal or base64 strings
Calculate the entropy of these strings
If the entropy is high enough, and it’s not been seen before, store the string
Output all strings that are found

The hardest step is trying to identify “interesting” strings, without ending up with too many false positives. Reposcanner has some known patterns of boring strings which it ignores, and you can tweak the minimum entropy to report if you’re getting too many false positive (the current value was obtained through some trial and error). A possible future option might be to also search for interesting strings (such as “api_key = foo”, or connection strings). Unlike truffleHog, which shows the entire diffs to give context, reposcanner has a much more concise output, which only shows you the relevant line, along with the commit information so that you can go and examine the commit yourself if the string looks interesting. This makes the output much more manageable, especially when scanning larger repositories. Scanning some randomly selected repositories in GitHub resulted in the expected interesting strings, including:

API keys third party services, in an employee financial bonus scheme
Application and database passwords
A SQL database backup

This is a serious risk for companies when internally developed projects are released to the public – developers are less likely to be careful with their commits to an internal project compared to a publicly available one, and this increases the likelihood of inappropriate files making their way into the version control system. It can also reflect badly on a company if you have unprofessional code, comments or commit messages – a message like “accidentally deleted database” doesn’t tend to inspire confidence.

Going through the commit history and trying to sanitise is likely to be unfeasible, unless it’s a trivially sized repository, so the approach that most organisations take is just to completely wipe the commit history – either by creating a fresh repo and copying the files into it, or destroying the entire history with a rebase. While this provides a degree of protection from inappropriate commit messages or data being leaked, it does also destroy the development history of the repository, which is very valuable to developers when trying to fix bugs, or to understand why certain decisions have been made in the development process. As always, it’s the trade-off between security and convenience.

Besides destroying the repo history, the best thing that you can do to protect against these issues to have secure development practices from the start, even for projects that you’re never anticipating releasing. This should include making sure that sensitive information is never committed into source control, and of course, trying to keep comments and commit messages (reasonably) professional.

The Reposcanner code is available on the Dionach GitHub at https://github.com/Dionach/reposcanner – pull request are welcome as always.

Like what you see? Share with a friend.

Explore Our Services

Let’s Explore How We Can Support Your Cybersecurity Journey

Get in touch with our team today to find out how we can help you.

Discover Our Latest Research

From Policy to Practice: Penetration Testing for ISO 27001

ISO 27001:2022 is the international standard for establishing, implementing, maintaining, and continually improving an Information Security Management System (ISMS). While the standard does not explicitly mandate penetration testing, it remains a critical supporting activity for demonstrating technical assurance and verifying the effectiveness of security controls. By incorporating regular, scoped, and risk-aligned penetration testing into their […]

ISO 27001 & AI: Don’t Rebuild. Extend.

As organisations race to integrate AI for competitive advantage, we rarely see a lack of activity. Instead, we see a variation in strategy, often resulting in missed opportunities for efficiency. We tend to see businesses fall into one of three categories. First, there are those pushing for speed; deploying AI rapidly to gain an edge while viewing […]

Email Security Isn’t Working! Why Most Organisations Are Still at Risk and What to Do Next?

Email remains the most exploited attack vector in cybersecurity despite years of investment in secure email gateways, phishing filters, awareness training, and cloud-native tools. For many organisations, these defences are simply no longer enough. At Dionach, we see this reality firsthand. Across penetration tests, adversary simulations, and threat-led assessments, email continues to be one of the most […]

Reposcanner

Explore Our Services

Let’s Explore How We Can Support Your Cybersecurity Journey

Discover Our Latest Research

From Policy to Practice: Penetration Testing for ISO 27001

ISO 27001 & AI: Don’t Rebuild. Extend.

Email Security Isn’t Working! Why Most Organisations Are Still at Risk and What to Do Next?

The Company

Services

Stay up to date

Contact Us Reach out to one of our cyber experts and we will arrange a call