How can I acquire data in an efficient, ethical, and secure way, and how can I ensure that my data is used appropriately?
Goals:
Know what services available
Understand DUAs, NDAs, and IRB
Plan for data security at all stages
Name a few partners and their role in data acquisition.
Why use data templates?
What tools might one use for data collection? Why?
Tell me about Data Security…
Give me an example of L3 and L4 data
Highlight some differences between L3 and L4 data
How are a DUA, an NDA, an IRB submission, and a Data Safety plan related? Or not?
Data generated by investigator:
Data acquired from others:
Does the data you need already exist? Do you know how & where to find it?
Is it already licensed by Harvard or need to be acquired? Are appropriate funds available if needed
Does it require a Data Use Agreement (DUA) or IRB submission?
Experiment | A scientific procedure undertaken to make a discovery, test a hypothesis, or demonstrate a known fact |
---|---|
Observation | The action or process of observing something or someone carefully or in order to gain information |
Simulations | The production of a computer model of something, especially for the purpose of study |
Derived / compiled |
Base data on a logical extension, modification, or collection of items |
HBS Services can help faculty and their teams acquire data:
For persons from other schools, please contact your local library's data service professionals, or see https://hlrdm.library.harvard.edu/network.
Baker Library Subscriptions | Wide range of data available |
---|---|
Baker Research Services | Custom discovery and delivery of data |
Baker Faculty Data Licensing Service | Negotiation of licenses/DUAs with vendors (for faculty acquisition or purchase) |
Behavioral Research Services | Supports the data collection needs of HBS faculty and doctoral students conducting a broad range of experimental and behavioral research |
DRFD Research Administration | Supports DUAs and IRBs |
Research Computing Services (RCS) | Data collection via web scraping; wrangling via cleaning, matching, merging, etc. |
Consider using tools, templates, & data dictionaries when collecting data
Increases accuracy & efficiency
_Promotes collection & preservation of metadata (source, year, …) _
Promotes consistency & reliability (where, how, what, …)
- For collecting data, use electronic notebooks
- OneNote/O365
- Documents in HBS SharePoint/O365
- Evernote ($$)
- FileMaker Pro ($$)
- Open Science Framework (OSF)
- RSpace _ as a possible Harvard-wide tool_
- For surveys:
- HBS Qualtrics (Data at <= L3)
- HMS Redcap (Data at <= L4)
Again, keep appropriate data security in mind with external or 'synchronized' services
More at http://bit\.ly/2RCosb4
-
The need for data security touches upon all steps of the data lifecycle!
-
A thorough understanding of the data, metadata, and its custodianship will drive the RDM narrative
-
All persons should understand and comply with the HBS IT and HU data security requirements
-
Based on your understanding of the data and the data security requirements, this will inform:
- Your options for acquiring / transferring data
- Your options for storing the data
- Your options for analyzing the data
-
Example: PII / Human Subjects data can be stored on L4 research storage as part of the HBS RC environment, but not on Windows & Mac desktops & laptops
-
The need for data security touches upon all steps of the data lifecycle
-
A thorough understanding of the data, metadata, and its custodianship will drive the RDM narrative
-
All persons should understand and comply with the HBS IT and HU data security requirements
-
Based on your understanding of the data and the data security requirements, this will inform:
- Your options for acquiring / transferring data
- Your options for storing the data
- Your options for analyzing the data
-
Is important to consider while on- and off-campus
- Email at home?
- Using your mobile phone or tablet
- What about while traveling?
- Even more important in our remote-work/pandemic status
-
E.g. PII / Human Subjects data can be stored on L4 research storage as part of the HBS RC environment, but not on Windows & Mac desktops & laptops
- Security is more than where you store it – it's how you approach the care, handling, and movement of data
- This will vary depending on sensitive data level
- May be determined by Data Safety plan.
- See IT Security handout for appropriate considerations
- And these other helpful websites:
Data Security via Data Safety Portal
-
Submit data security plans at the Harvard Data Safety Portal.
-
A Data Safety plan will be required for all DUAs & IRB submissions deemed to include sensitive data*
-
Helps faculty research groups plan and execute good RDM practices, including:
- What resources should be used
- What persons should be involved in the data acquisition, analysis, and sharing
- What restrictions may apply based on the data content, stewardship, or geographic location/source of the data
-
Will dictate compliance with Harvard L2, L3, or L4 data security protocols
-
You might be involved in helping to prepare the Data Safety plan.
-
Your local IT Security, RC Center, Research Administration, or Library Data group can
- Help you create a data security plan compliant with data and university requirements
- Discuss what may be the best options for short- and long-term projects.
- Tip! Use the Data Safety Plan User Guide for examples & guidance
*This will be covered in just a few slides
Data Protection Regulations & Policies
- There are a number of regulations and policies already in play:
- HIPAA (18+ identifiers – alone or in combination datasets) Informed Consent
- FERPA (education information and special protections)
- MA data protection law (security requirements to handle private data from state residents)
- Stem Cell data and Genomics data must be published in approved repository, but also must be de-identified.
- GDPR (General Data Protection Regulation in Europe)
- PIPL for data coming from China
- California Data Privacy ( CCPA + CPRA )
- Harvard Data retention (7 years)
- This is a rapidly-changing landscape!
- China's policy effective November 2021
- GDPR regulations have changed 2x in the several years
- California's laws have been changed/amended 2x, expanding the scope
- DRFD Research Administration & OVPR are here to help
See PDF at https://security.harvard.edu/handout-research-data-security-levels-examples
These are legally-binding documents that should be signed only by authorized representatives of the school or University
-
DUAs are almost always required when there is transfer of data
- July 15, 2021 Harvard Research Data Security Policy went live (HU OVPR)
- Balances risks & challenges & considers regulatory and contractual constraints
- Some exceptions are permitted; consult your Research Admin office if unsure
-
Several group at HBS can help
- Assist with the process of DUA preparation, review, & signing:
- Done in coordination with via Harvard's Office of Sponsored Programs
- No-cost DUAs: Alain Bonacossa (DRFD) or Katherine McNeill (Baker)
- Else, contact the Data Licensing Service: Katherine McNeill (Baker)
-
These govern access to and treatment of data:
- May be required by a data provider with Harvard for use in your (local or school-level) research, or
- Provided by Harvard to an outside organization for use in its research.
-
Can be referred to as:
-
License agreement,
-
Confidentiality Agreement,
-
Non-disclosure agreement,
-
Memorandum of Understanding,
-
Memorandum of Agreement
-
…but these are all distinct and separate types of agreements with different purposes
-
__IRB approval is required when conducting __ human subjects research
Research = systematic investigation, including development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.
Human subject = living individual about whom an investigator conducting research obtains (1) data or biospecimens through intervention or interaction with the individual; or (2) identifiable private information or identifiable biospecimens.
Human Subjects Research = the systematic collection of information about people designed to develop or contribute to generalizable knowledge.
__Note: __ Not all research is human research. You may be conducting a systematic investigation that involves people, but it may not be generalizable. Or it may be generalizable, but it is not about people.
Primary data collection | Secondary data collection |
---|---|
Experiments (field, online, lab) Surveys, interviews, observations |
Analysis of individual-level identifiable data Scraping data from (non-public) websites Merging data from multiple sources |
Best resources for information and to contact:
Harvard University Area IRB for main campus and Allston:
Committee on the Use of Human Subjects (CUHS)
Longwood Area IRB for Medical School, Dental School, and T.H. Chan School of Public Health:
Office of Human Research Administration (OHRA)
HBS: _ _
Alma Castro is available to advise you on federal & state regulations and university policies that apply to research with human subjects.
The DRFD Research Administration team also reviews IRB applications on behalf of Harvard’s Committee on the Use of Human Subjects (CUHS).
- DRFD Compliance: https://inside.hbs.edu/Departments/drfd/Pages/research-compliance.aspx
- CUHS (HUA/IRB): https://cuhs.harvard.edu/
__IRB approval is required when conducting human subjects research __
Please contact:
Harvard University Area IRB for main campus and Allston: Committee on the Use of Human Subjects (CUHS)
Longwood Area IRB for Medical School, Dental School, and T.H. Chan School of Public Health: Office of Human Research Administration (OHRA)
HBS: __ __ Alma Castro is available to advise you on federal & state regulations and university policies that apply to research with human subjects. The team also reviews IRB applications on behalf of Harvard’s Committee on the Use of Human Subjects (CUHS).
Primary data collection | Secondary data collection |
---|---|
Experiments (field, online, lab) Surveys, interviews, observations |
Analysis of individual-level identifiable data Scraping data from (non-public) websites Merging data from multiple sources |
- DRFD Compliance: https://inside.hbs.edu/Departments/drfd/Pages/research-compliance.aspx
- CUHS (HUA/IRB): https://cuhs.harvard.edu/
- Two broad groups of non-public data
- Confidential/Proprietary: Data that is licensed, provided by DUA, NDA, etc
- IRB-related: Sensitive and Non-Sensitive (but confidential)
- Sensitive data usually include Personally Identifiable Information (PII), health data, financial data, etc
- Deidentification may be tedious yet important step to be done cautiously & thoroughly
- _Highly recommend _ this be done by the data provider before receipt of the data
- Re-identification by grouping secondary data (or indirect identifiers) is very possible
- Consider multiple approaches to permit data granularity and fidelity while preventing re-identification
- E.g. if 1 st _ three digits of ZIP codes + year of birth == 0.04% of individuals can be re-identified vs ZIP + birthday + sex == 87% (Sweeney et al.; 2000) _
- Just as important to promote preservation and re-use of sensitive data
- Don’t promise to destroy your data
- Don’t promise not to share your data
- Do get consent to retain and share data
- Do incorporate data-retention and -sharing clauses into IRB templates
- Many evolving techniques to safeguard privacy yet promote reuse
- (HBS) Contact RCS, KLS RDP, or DRFD Research Admin if you have any questions. (Others) Please contact your Research Admin office.
- _Professor Smith has asked you to obtain company financial data compiled by a firm called _ FinanceCorp _. These data include confidential company financial data, and are therefore considered extremely sensitive. The _ FinanceCorp _ data will be merged with the data Professor Smith collected five years ago from company CEOs. _
- How will you plan for this study?
- Who could help you determine :
- If the IRB should be involved?
- _Is a DUA is needed? _
- Are the data are affected by GDPR?
- What Harvard security level the data might be?
- Who could help you store and transfer the data?
Services are available to help with data acquisition, no matter if acquired by the investigator (primary) or from others (secondary).
Use software tools to aid in efficient & accurate collection
In whatever manner the data are acquired, be mindful of requirements related to IRB, DUAs, and Data Security levels
Confidential / sensitive data requires special precautions at all stages
References
https://inside.hbs.edu/Departments/it/security/Pages/default.aspx
Electronic (Lab) Notebooks: http://bit.ly/2RCosb4
https://grid.rcs.hbs.org/transferring-data
https://inside.hbs.edu/Departments/it/security/Documents/InfoSecQuickGuide20200414-HBS.pdf
https://vpr.harvard.edu/files/ovpr-test/files/dua_policy_statement_final.pdf
https://ras.fss.harvard.edu/files/ras/files/safety_submission_guide.pdf
https://researchdatamanagement.harvard.edu/human-subjects-research
https://huit.harvard.edu/remote
https://www.harvard.edu/coronavirus/work-remotely
https://inside.hbs.edu/Departments/it/howto/Pages/work-remote.aspx