Word Documents Hide Information

August 2003

Microsoft Word documents may contain hidden sensitive information, especially when the document has had a number of revisions or a number of people have worked on it. The hidden text could include:

* Text from other documents open at the same time
* Previously deleted text
* E-mail headers and server information
* Printer names
* Data about the machine where the document was written
* Where the document was saved
* Word version number and document format
* Names and usernames of document authors

There is a function in many versions of Microsoft Office programs, which includes Word, Excel and PowerPoint, that means that fragments of data (which Microsoft refers to as metadata) from other files you deleted or were working on at the same time could be hidden in any document you save.

This could be embarrassing for any home workers whose colleagues find out that they have been applying for jobs while working at home or being less than complimentary about their co-workers.

Look and learn

With the right tools this hidden data can easily be extracted.

Unix and Linux users can turn to tools such as Antiword and Catdoc to turn the document, including its formatting information, into a simple text file.

Computer researcher Simon Byers has conducted a survey of Word documents available on the net and found that many of them contain sensitive information.

He gathered about 100,000 Word documents from sites on the web and every single one of them had hidden information.

In a research paper about the work Mr Byers wrote that about half the documents gathered had up to 50 hidden words, a third up to 500 words hidden and 10% had more than 500 words concealed within them.

The hidden text revealed the names of document authors, their relationship to each other and earlier versions of documents.

Occasionally it revealed very personal information such as social security numbers that are beloved of criminals who specialise in identity theft.

Also available was useful information about the internal network the document travelled through, which could be useful to anyone looking for a route into a network.

Mr Byers wrote that the problem of leaky Word documents is pervasive and wrote that anyone worried about losing personal information might want to consider using a different word processing program.

Alternatively he recommends using utility programs that scrub information from Word documents or following Microsoft's advice about how to make documents safer.

Source...