Skip to main content
James DuBois

Introducing the Qualitative Data Sharing (QDS) Toolkit


Rachel H. Lee

We spoke with James M. DuBois, DSc, PhD, principal investigator of the QDS Project, about the new Qualitative Data Sharing (QDS) Toolkit. The QDS project team created the toolkit to support researchers who seek to share qualitative research data and provide timely guidance for navigating the National Institutes of Health (NIH) Policy for Data Management and Sharing that went into effect in January of this year and applies to NIH funded research that results in the generation of scientific data. Dr. DuBois is the Bander Professor of Medical Ethics and the Executive Director of the Bioethics Research Center at Washington University in St. Louis.

Thank you for developing the QDS Toolkit to support ELSI researchers who are planning to share qualitative research data. What need did you see that inspired you to develop this resource?

Dr. DuBois: I served on the NIH SEIR study section for many years and reviewed data sharing plans. I noticed that no one shared their qualitative data. I also noted that the NIH policy requiring data sharing did not explicitly provide an exception for qualitative data. Our team started looking into things. We published an article in the journal Qualitative Psychology asking, “Is it time to share qualitative research data?” We made the argument that not all qualitative data should be shared, but a lot of qualitative data could be shared responsibly and there would be significant benefits in doing so. Several nations—e.g., Australia, Finland, and the UK—had dedicated repositories for qualitative data. However, many resources still appeared lacking, and support for the idea seemed low in the U.S. In 2016, we submitted an R01 proposal to NHGRI to engage stakeholders, explore barriers, develop resources, and pilot qualitative data sharing. The rest is history.

Why should qualitative ELSI researchers use the QDS toolkit? Are there any features you would especially like to highlight?

Dr. DuBois: Most qualitative researchers have never shared data with a repository. There are many practical, ethical, and legal issues when sharing data. How do I de-identify data? Do I need IRB approval to share data? What repositories should I work with? What do I need to share—coded excerpts, full transcripts, or audio files? Should I openly share my data or restrict access? These are big decisions. Our toolkit’s guide provides an overview of these issues and educates researchers on a range of acceptable options.

The National Institutes of Health (NIH) has mandated the sharing of all data, including qualitative data. What advice do you have for qualitative researchers who are creating a data management and sharing plan that satisfies the new NIH requirement?

Dr. DuBois: I recommend reaching out to a data repository early in the process. The Qualitative Data Repository at Syracuse and ICPSR at the University of Michigan have experience curating qualitative data and can advise on data sharing plans and budgets. Institutional repositories (e.g., at a researcher’s home university) may provide another option, but often they lack experience with qualitative data and are not prepared to share data under a restricted access arrangement. A dataset that is not sensitive and is de-identified might be shared open access, but in general, we advise sharing qualitative data using a restricted access arrangement. This requires secondary users to have IRB approval for their uses, share their plan to protect data, and promise not to attempt re-identification of participants.

What are some of the unique ethical issues that researchers who need to share qualitative data should be thinking about?

Dr. DuBois: Most participants seem to support data sharing if their data are de-identified. After data are de-identified, IRBs do not even consider research using them to be human subjects research. The biggest issue in most studies is reducing the risk of re-identifying a participant, while ensuring that data remain useful for secondary analyses. Secondary users of qualitative interviews—if the topics that were explored are at all sensitive—should be required to apply to use those data. They should have IRB approval, have a plan to protect data as if they were their own, and promise not to attempt to re-identify participants. Having excellent processes for data sharing and being transparent about these processes is key to protecting public trust in the research enterprise. Following repository guidelines for data sharing is also key to supporting high-quality secondary analyses. There is no point in sharing data if they cannot be used in meaningful ways.

What concerns of potential research participants about qualitative data sharing do you think ELSI researchers should prepare for?

Dr. DuBois: We did 30 interviews with participants in diverse qualitative research studies, and have reviewed the work of Rebecca Campbell, Sebastian Karcher, and others. In general, participants are very supportive of data sharing. These are people who agreed to participate in research, and they want their participation to have a positive impact. If other researchers can use their data constructively, they support it. Even when payments are a primary motivation, participants usually state they are also motivated to improve things for people like themselves. This of course assumes that data shared are adequately de-identified and that only other researchers can access them.

Things may be quite different when working with a well-defined community or with tribal nations. In such cases, there may be greater concerns about data ownership and control, as well as secondary research uses of data. Having conversations about data sharing in the study planning phase is essential.

What recommendations do you have for communicating about data sharing with qualitative research participants?

Dr. DuBois: I would explain the protections that will be in place. Tell them about data de-identification and whether data will be available only to other researchers who have had their protocols reviewed and approved by an IRB. Then, they can make an informed choice about study participation.

The QDS toolkit has a planning guide to help researchers prepare to share qualitative data. Could you please tell us more about why researchers should plan to share data? For example, what activities does the guide recommend before data collection begins?

Dr. DuBois: Before data collection begins, it is helpful to develop consent language on data sharing, decide when you will de-identify data, and consider the timing of first publication. The updated NIH policy requires sharing data at the time of first publication (or sooner), so researchers will want to be prepared. Our toolkit offers a lot more detail on how to prepare.

What are some benefits of sharing qualitative data that come to mind for you?

Dr. DuBois: Studies are cited more frequently when data are shared. Without increasing data collection expenses or burdens on participants, data sharing supports the exploration of new research questions. Data sharing greatly increases transparency, which can increase trust in research and discourage biased or fraudulent work. Even when we aim to be “objective” we must realize that researchers typically identify themes that will be published and ignore others and pick illustrative quotes. This means that >99% of statements are never seen by anyone outside of the research team.

Can you think of any types of qualitative data that are easier to share relative to others? Are there any types of qualitative data that should not be shared at all?

Dr. DuBois: These days, our team is conducting a lot of interviews via Zoom. I generally like the quality of the data we obtain this way, but facial images and voice prints are HIPAA safe harbor rule identifiers. Ordinarily, our team—and the repositories we work with—feel that transcriptions of interviews are the best form of data to share. Some researchers ask about just sharing their codebook and excerpts that they highlighted when applying codes. In my view, that is not data sharing. That’s sharing findings, not the data you analyzed to generate the findings. It offers none of the benefits of data sharing that I mentioned above in terms of supporting secondary analysis, increasing transparency, and managing bias.

Please tell us about the Qualitative De-identification Support (QuaDS) Software your team is developing. What does it do?

Dr. DuBois: De-identifying data “manually” is a very time-consuming process that requires development of a de-identification protocol and practice. One of the barriers to qualitative data sharing has been the burden of adequately de-identifying data. We worked with the informatics team at Washington University to develop QuaDS software to assist with data de-identification. It highlights instances of HIPAA safe harbor identifiers, such as names and addresses, as well as a series of variables that, when combined with other information, could potentially identify someone, such as race, rare disease diagnosis, LGBTQI+ identity, and institution name. In our pilot tests, the software has performed very well with an F score of .96, which indicates excellent sensitivity and specificity or precision and recall. (COI Disclosure: QuaDS may be licensed to a vendor over the next year, and as a developer, I could receive royalties through Washington University.)

Is there anything else you would like to share?

Dr. DuBois: When we surveyed over 400 qualitative researchers, we found they were nearly evenly split into those who supported and those who opposed sharing qualitative data. But everyone had some concerns about it—concerns about how participants might feel, about the time it will take, about expenses, and many other things. For most researchers, this is an adventure into the unknown. It is important to know that some researchers have shared data in a responsible manner, there exist repositories with relevant expertise, and things should get easier over the next few years as we grow in experience and resources.

The QDS project team at the Washington University School of Medicine in St. Louis developed the QDS Toolkit to support researchers in planning for and sharing qualitative data in an ethical manner. The toolkit includes a planning guide for research project planning, data collection and management, and more. Additionally, there is information on the forthcoming Qualitative De-identification Support (QuaDS) Software, which is intended to speed up data de-identification.

For more resources to help you navigate the 2023 NIH Data Management and Sharing (DMS) Policy, visit ELSIhub’s curated resource set to find resources such as the list of NIH-supported Data Repositories and DMPTool, an online application that helps you build your data management plan.