ELSIcon2022 • Paper • June 3, 2022
Presented by: Sarah Nelson
Jacklyn Dahlquist, Stephanie M. Fullerton
Individual-level genomic, environmental, and linked health outcome data are being generated at unprecedented pace and scale in human biomedical research, posing profound technical and logistical challenges for data storage, sharing, and analysis. In response, new cloud-based computing and storage platforms are being developed to support data processing and analysis, accompanied by efforts to streamline and even partially automate data access. While designed to facilitate collaboration and maximize the scientific utility of costly-to-generate genomic and linked clinical data, the broader ethical implications of the new platforms and procedures remain under-examined by the ELSI community. To better understand the impact of these new approaches on data stewardship, we conducted a landscape analysis of four newly developed NIH-supported cloud platforms: the NHLBI BioData Catalyst, the NHGRI AnVIL (Analysis, Visualization, and Informatics Lab-space), Kids First Data Resource Portal, and the All of Us Research Hub, as well as two predecessor data sharing platforms, the NCBI Database of Genotypes and Phenotypes (dbGaP) and the NCI Genomic Data Commons. In this presentation we report initial findings based in a content analysis of platform documentation, participant observation of platform developer discussions, and key informant interviews with platform developers and related leadership. Our observations suggest that while cloud-based platforms help centralize data storage and analysis, distributed approaches to managing data access complicate the roles and responsibilities of data stewards, with potential implications for both participant and public trust.