EDRM Micro Datasets

EDRM has begun to publish what will be a series of “Micro Datasets,” some available to the general public and some for EDRM members only. These datasets are designed for eDiscovery testing and process validation. Software vendors, litigation support organizations, law firms and others may use these smaller sets to qualify support, test speed and accuracy in indexing and search, and conduct more forensically oriented analytics exercises throughout the eDiscovery workflow.

The EDRM community thanks these members for their active participation in this important initiative:

  • Eric Robi
  • Michael Lappin
  • Chad Main
  • Henry Moreno

Public EDRM Micro Dataset

The initial Public EDRM Micro Dataset is an approximately 136.9 MB zip file containing the latest versions of everything from Microsoft Office and Adobe Acrobat files to image files. The EDRM Dataset group has scoured the internet and found usable freely available data at universities, government sites and elsewhere, a selection of which are included in the zip file.

Members-Only EDRM Micro Dataset

The initial Members-Only EDRM Micro Dataset is similar to the initial public dataset but much larger, at approximately 5.5 GB. It is available only to EDRM members.

The full dataset is sourced from publicly available data and free from copyright restrictions. It was assembled by the Digital Forensics Research Laboratories at the Auckland University of Technology, in collaboration with the EDRM Dataset team.

The EDRM Micro Dataset is valued for its large variety of file types and other challenges characteristic of ESI collected in discovery cases. The files have various levels of corruption, and the dataset contains a duplicate set of files that are encrypted, to support exception handling exercises and advanced testing.

The EDRM Micro Dataset mix of file types includes:

  • A variety of.csv files
  • Websites and web pages
  • Adobe Acrobat files
  • Graphic files and photographs
  • Public census data
  • Microsoft Office files
  • Audio files
  • 4 email boxes with shared correspondence, threads and attachments
  • Multiple Encase .e01 files containing data from a phone and another data source

The Dataset team includes:

  • Eric Robi, president, Elluma Discovery
  • Michael Lappin, director, Technology and Sales Engineering, Nuix
  • Chad Main, founder, Percipient
  • Henry Moreno, eDiscovery manager, Dell Inc.
  • Brian Cusack, director, AUT Digital Forensic Research Laboratories, and professor, ECU Security Research Center, Auckland University of Technology
 Members – Download Micro Dataset 
 Not a Member? Join EDRM to Download Micro Dataset 
Please complete the following to download file
Your Name *

Your Email *

Subscribe me to the mailing list

I would like to join EDRM as an active member *:
Being on a mailing list or in a LinkedIn group does not make one an EDRM member. Click here for more information.

Yes - As an OrganizationYes - As an IndividualYes - Unsure Organization or IndividualNoI already am a member

After you select "Submit", a link to the file will be displayed below and a copy of the file will be sent to the email address you entered. If you do not receive the email message, please check your spam folder.

Your Organization *

Organization Type *
AssociationCorporationGovernmentLaw FirmProviderNone

Your Telephone Number

Your Position

Your Website (http://...)

Your Address