Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Reposcanner

Reposcanner is a Python script designed to scan Git repositories looking for interesting strings, such as API keys or hard-coded passwords, inspired by truffleHog. Sensitive information like this often gets included in the earlier stages of the development process (or accidentally), and is generally removed before the application or source code is released. However, since Git keeps a history of all changes, by going back through these commits, we can scan back through the commit history to obtain information that has been removed in the latest version. The basic flow of reposcanner is as follows:

  • Try and clone the repository if a remote URL is given
  • Get the active branch (or all branches if the -a option is given)
  • Create a diff for each commit in the select branch(es)
  • Ignore any known boring string patterns and files names/extensions
  • Extract any long hexadecimal or base64 strings
  • Calculate the entropy of these strings
  • If the entropy is high enough, and it’s not been seen before, store the string
  • Output all strings that are found

 
The hardest step is trying to identify “interesting” strings, without ending up with too many false positives. Reposcanner has some known patterns of boring strings which it ignores, and you can tweak the minimum entropy to report if you’re getting too many false positive (the current value was obtained through some trial and error). A possible future option might be to also search for interesting strings (such as “api_key = foo”, or connection strings). Unlike truffleHog, which shows the entire diffs to give context, reposcanner has a much more concise output, which only shows you the relevant line, along with the commit information so that you can go and examine the commit yourself if the string looks interesting. This makes the output much more manageable, especially when scanning larger repositories. Scanning some randomly selected repositories in GitHub resulted in the expected interesting strings, including:

  • API keys third party services, in an employee financial bonus scheme
  • Application and database passwords
  • A SQL database backup

 
This is a serious risk for companies when internally developed projects are released to the public – developers are less likely to be careful with their commits to an internal project compared to a publicly available one, and this increases the likelihood of inappropriate files making their way into the version control system. It can also reflect badly on a company if you have unprofessional code, comments or commit messages – a message like “accidentally deleted database” doesn’t tend to inspire confidence.

Going through the commit history and trying to sanitise is likely to be unfeasible, unless it’s a trivially sized repository, so the approach that most organisations take is just to completely wipe the commit history – either by creating a fresh repo and copying the files into it, or destroying the entire history with a rebase. While this provides a degree of protection from inappropriate commit messages or data being leaked, it does also destroy the development history of the repository, which is very valuable to developers when trying to fix bugs, or to understand why certain decisions have been made in the development process. As always, it’s the trade-off between security and convenience.

Besides destroying the repo history, the best thing that you can do to protect against these issues to have secure development practices from the start, even for projects that you’re never anticipating releasing. This should include making sure that sensitive information is never committed into source control, and of course, trying to keep comments and commit messages (reasonably) professional.

The Reposcanner code is available on the Dionach GitHub at https://github.com/Dionach/reposcanner – pull request are welcome as always.


Find out how we can help with your cyber challenge

Please enter your contact details using the form below for a free, no obligation, quote and we will get back to you as soon as possible. Alternatively, you can email us directly at busdev@www.dionach.com
Contact Us

Contact Us Reach out to one of our cyber experts and we will arrange a call