University team unveil data set to bolster research into ransomware detection

Newly-published paper details the creation of NapierOne

Cyber security experts at Edinburgh Napier have created a new data set which will support cutting-edge research into ransomware detection.

Ransomware – malware that encrypts files, giving the attacker scope to demand a ransom to restore access – has become a popular and potentially lucrative method of attack for cyber criminals.

However, newly-created NapierOne (www.napierone.com) is now available to help test and evaluate new detection methods, amid concerns that previous data sets used in digital forensics research have become outdated.

The new openly accessible ready-to-use data set will improve consistency by using standard formats allowing earlier studies to be replicated. As such it will improve the pace and direction of research into ransomware, and could help find robust solutions to the threats it poses.

NapierOne’s creators also believe it is generic enough to support many other fields of research that require a varied mix of common files.

Govdocs1

The most well-known publicly available data set used in malware analysis to date has been Govdocs1, now more than a decade old.

It was designed to help reproduce forensic research, but doubts have emerged about how well it reflects current usage, with some increasingly popular file types not being well represented.

And where there have been a lack of useful data sets available to researchers, they have often developed their own and have not distributed them when their work is complete.

In a new paper published in Forensic Science International: Digital Investigation, Edinburgh Napier PhD research student Simon Davies and senior computing academics Professor Bill Buchanan and Associate Professor Rich Macfarlane detail the creation of NapierOne as a complement to Govdocs1. 

Their research identified popular file formats for inclusion as they set about creating a data set containing more than 500,000 unique files distributed between 100 separate data sets and subsets.

The paper describes how specific file types were selected, how examples were sourced and how researchers are able to gain free, unlimited access to the data.

The authors see NapierOne as a starting point for an ongoing project which will grow and develop as other researchers provide additional data sets that can be incorporated into it.

Simon Davies said: “It is hoped that the adoption of the NapierOne data set into the implementation, development and testing lifecycles of new ransomware detection techniques will streamline and accelerate the development of more robust and effective detection techniques, allowing independent researchers to reproduce and validate proposed detection methods quickly.”

Portrait of Rich MacFarlane

Associate Professor Rich Macfarlane said: “Ransomware has been around for many years – encrypting and deleting users’ files and demanding a ransom from the victim. It has become increasingly common and its sophistication has increased significantly, leading to it currently being the biggest cyber security problem globally.

“This work aims to provide a research data set allowing scientific rigour in research towards fighting the ransomware problem. The data set has been created and successfully used in our ransomware detection research.

“Containing over half a million unique files representing real world file types, it is broad and diverse enough to be used in a range of cyber security and forensic research areas.

“We hope the data set will have the same global research impact as the Govdocs1 work.”

Professor Bill Buchanan said: “There are few areas of cyber security that need more of a scientific base than in digital investigations, and thus there exists a need to make sure investigators have appropriate tools that have been verified and properly evaluated. This data set provides a foundation for researchers to prove their new methods, and thus further support innovation in the area.

“The UK is becoming an international leader in the field of safe technology – which involves the development of tools to support digital investigations and threat detection – and this research showcases the development of a strong scientific base.”