•  Oxford: +44 (0)1865 877830 
  • Manchester: +44 (0)161 713 0176 
  •  London: +44 (0)203 5983740 
  •  New York: +1 646-781-7580 
  • Dubai: +971 (0)4 427 0429

Information extracted from online documents

You are here

10

Feb

Information extracted from online documents

Hacking in the movies happens at breakneck speed. Someone needs access to some database or internal system hosting confidential data and the “genius coder” will fly their fingers across the keyboard before seconds later dropping the painfully trite and clichéd line “I’m in”. Hacking in real-life, whether performed during a sanctioned penetration test or genuine attack, simply does not happen like this. Penetration testers and “black hats” alike typically follow a cyclic multi-step methodology which includes the following activities: information gathering, scanning, exploitation and maintaining access. Each of these phases involves multiple tasks, which are often extensive and laborious. Although exploitation typically attracts the most coverage, because it is the stage in which a host becomes compromised, many in the security industry are united in the opinion that information gathering is the most critical episode in determining the likelihood of a successful penetration.

During the information gathering stage an attacker will attempt to uncover important details about their target. Inadvertently and unknowingly, many organisations are making this process easier by needlessly giving away sensitive information in the documents they host, post or exchange online. Applications which create the common file types we all routinely use such as PDF, DOC, PPT, XLS and JPEG bundle metadata and hidden information into the file. Left unsanitized, these documents, which seemingly contain only innocuous information, can in fact be laced with vital clues about your networking environment and security posture.

Tools such as FOCA (short for Fingerprinting Organizations with Collected Archives) and Metagoofil are great for revealing these useful snippets of information and are freely available to download. When fed with a domain name they will use specially crafted queries in search engines to find the documents hosted on the specified site. These and other files can be run through the built-in metadata extractor before the tool quickly generates an easy to interpret analysis of the findings. So what information can be lurking behind the scenes in these files and what are these tools capable of recovering? Well, lots actually. Host names, IP addresses, the types and versions of operating systems and software deployed on your network, geo-location data, usernames and email addresses; even the odd password. Such knowledge would pay dividends to the attacker in the later stages of a physical, social engineering or electronic engagement.

Fortunately, this information is easily removed so organisations needn’t handover this important data so readily. Most word processing applications will provide the facility to do this automatically.

The following link shows an example using Microsoft Office 2010 and 2013:
https://office.microsoft.com/en-gb/word-help/remove-hidden-data-and-pers...

Follow this link if you or your company use Microsoft Office 2007:
http://office.microsoft.com/en-us/excel-help/remove-hidden-data-and-pers...

A quick query using your favourite search engine will display links to similar how-to pages for other word processing and general office applications.

Organisations would be wise to ensure that the documents they host and distribute, both internally and externally, are effectively sanitized prior to publication and exchange. Making staff aware of the information they could be giving away, developing policies and providing the appropriate training are good ways of doing this. Information gathering is often a long and complex procedure and there are multitudes of ways that an attacker can gain clues about you and your company. Sanitizing your documents just makes their job a little harder.

Posted by Alex

Leave a comment