Handling Sensitive Data
Data may be considered sensitive if it might cause harm, violate privacy, or otherwise put individuals or groups of people at risk. However, it is often still possible to share sensitive data by addressing informed consent, de-identification, and access controls.
Understanding Data Sensitivity
There are four levels of data sensitivity at UMD; Low (Level 1), Moderate (Level 2), High (Level 3), and Restricted (Level 4).
- Low Risk (Level 1) - Low level data is that of which the release or unauthorized disclosure of would not cause harm to individuals, groups, or the university. To think about how this would apply, if there was a breach of your information and your name, birthday, and email are released on an unauthorized basis, then there would be little harm done to you. The release of low risk data would cause no harm to individuals, groups, and/or the university.
- Moderate Risk (Level 2) - Moderate level data is that of which the release or unauthorized disclosure of would likely cause adverse effects to individuals, but will not harm the university. Some things may cause harm or adverse effects to you such as the release of your employee performance review, UID, internal budget planning, etc. The release of moderate risk data may impact individuals in social, psychological, reputational, financial, or legal ways, but will not harm the university itself.
- High Risk (Level 3) - High risk data is that of which the release or unauthorized disclosure would cause significant harm or adverse effect to individuals, groups, or the university. Some things that may cause significant harm to you if released would be your mental health records, conduct or disciplinary records, your Social Security Number, your driver's license information, student loan information, etc. The release of high risk data would cause significant harm to individuals, groups, and/or the university.
- Restricted Data (Level 4) - Restricted data is that of which access and use are strictly controlled and restricted by laws, regulations, or contracts. Unauthorized access, use, disclosure, or loss will have significant legal consequences, including civil and criminal penalties, loss of funding, inability to continue current research, and inability to obtain future funding or partnerships. HIPAA data, FERPA data, CUI data, export-controlled data, etc. is all subject to laws, regulations, and/or contracts, and the unauthorized release of this data would result in significant harm to individuals, groups, and the university.
For more information, see the University of Maryland Data Classification Standard as well as the DIT Data Risk Guide to Commonly Used DIT Services
Preparing Sensitive Data for Publication and Sharing
Informed Consent
At the University of Maryland, research involving human subjects is governed by the Institutional Review Board and its processes for approving, overseeing, and confirming the conduct of these projects. Completing the IRB process, including a thorough description of your plans data storage, handling, and publishing can help you to confirm that your plans will align
Much of what you can share in published data will be dictated by the informed consent that you document on behalf of your subjects. Informed consent should specifically address your intentions to share, store, and allow other researchers to utilize the data you collect, which yet another reason why data sharing should be part of your early project planning and documentation. If possible, written informed consent forms that can be signed and stored are preferable, although verbal consent, if approved by the IRB, may also be sufficient.
Your informed consent should address the following:
- Who will be able to access the data? Will it be publicly available and through which repositories and or indexes? Will any of the data be available only to certain audiences or restricted to certain groups, ie. the project owners, members of the UMD campus, or specific groups of other scholars?
- What the data will be used for? Will you place restrictions on the dataset to limit reuse to research and/or teaching. Might it be used for commercial purposes?
- What kinds of confidentiality protections will you employ? Will certain information be redacted or restricted to certain audiences? Will you share only a subset of files or change the format of the data (eg. transcripts in place of audio recordings)?
If you did not originally obtain informed consent for data sharing, you may be able to retroactively seek consent. If you have contact information available, you may reach out to participants and obtain consent forms stating that the information they provided may be preserved for the long term, shared, and used by other scholars. If you are interested in seeking retroactive consent, you should contact the UMD IRB, since an amendment to your protocol and a review of your application will typically be necessary.
De-Identification
There are a variety of ways in which personally identifying information, both direct and indirect, can be removed from a dataset in order to protect the privacy of human subject participants. There are a variety of practices and tools that can be used from the outset (such as not asking participants to state their names or using pseudonyms) or throughout the data analysis process (such as generalizing or grouping information such as date ranges and geographic locations). There are also a variety of technical processes that may thoroughly redact data fields, manipulate audio and video files, or replace audiovisual materials with transcripts and other, more anonymous formats.
There are a wide range of strategies depending on the nature of your data and your data sharing goals. Some of these tools, such as digital manipulation, may be time-intensive or expensive to employ. We encourage those who are exploring forms of de-identification to speak to the IRB, and also to consider a consult with the Qualitative Data Repository (QDR), a trusted community partner, who the UMD Libraries maintain a membership to. You can schedule a consultation with the QDR to review your data management plan and discuss using QDR as your trusted repository by reaching out to them directly (qdr@syr.edu) or writing to lib-open-scholarship@umd.edu to ask a librarian to connect you.
QDR also provides information on De-Identification on their website.