Author(s):
The integration of data science into undergraduate STEM education is crucial for preparing students to thrive in an increasingly data-driven world. The National Academy of Sciences and the NSF emphasize the development of a workforce capable of harnessing the data revolution. However, instructors face various challenges (examples: packed curricula, diverse student backgrounds, lack of course-specific data science learning modules, etc.) in embedding data science concepts and applications into their courses. This IUSE/NSF project addresses the critical need for understanding the principles and best practices in integrating data science instruction across six different STEM courses by leveraging a multi-institutional (Virginia Tech, NCA&T, and Vanderbilt University) collaboration guided by a research-practice partnership (RPP) framework. The overarching research question was: What are key considerations, identified from instructors and students’ perspectives, for integrating data science concepts across STEM+C disciplines? Six sub-questions related to this overarching question were explored. The project team developed and implemented 12 data science modules tailored to meet the disciplinary, academic level, and pedagogical requirements of the six STEM courses at three universities over eight semesters. Extensive data were collected via surveys from over 800 students and 6 instructors and analyzed to answer the research questions. We found that instructors designed their modules to meet course needs and selected datasets based on availability, relevance, quality, data source platform, and familiarity with data analysis tools. Examples of the common data science topics across all the courses included generating and interpreting visualizations and conducting basic statistical analyses, which is a valuable piece of information for other instructors who wish to integrate data science into their own STEM courses. Instructors faced challenges with the variability in student data science skills and online course limitations during the COVID 19 pandemic and employed strategies such as training tutorials, teaching assistants, and demonstration videos to deal with these challenges. Regarding changes in students’ perceptions of data science and their related skillsets after completing a course with integrated data science modules, student survey responses indicated adequate exposure to core data science topics through the modules, with significant improvements in self-assessed abilities across main data science topics. Finally, an analysis of students’ grades and their self-assessment surveys suggested that students’ perceptions regarding data science-related skills aligned with instructors’ goals. The RPP approach proved effective in integrating data science modules into multiple STEM courses, exposing more than 800 students to data science instruction, of which 80% were from underrepresented groups in STEM. The data science modules are available for the interested parties on these websites: ds4stem.org and hydrolearn.org.
Coauthors
Gautam Biswas, Vanderbilt University, Nashville, TN and Manoj K Jha, NCA&T, Greensboro, NC.