Valparaiso University Tackles Data Science Problems That Matter
In the 4th interview of our series Quantcrunch: Higher Education’s Response, which examines how different universities are helping to meet the data talent shortage, we spoke with Dr. Karl Schmitt PhD, Director of Data Science Programs at Valparaiso University. Dr. Schmitt is an assistant professor in both the Mathematics and Statistics Department and in the Computing and Information Sciences departments. He also runs the Masters of Analytics and Modelling degree program, a data science–like master’s degree.
We discussed with Dr. Schmitt the university’s undergraduate Data Science degree program. This relatively new major curriculum is going into its 4th year of existence this fall.
As a math and computer science degree holder, Dr. Schmitt has a keen interest in all topics at the intersection of data science. As such, he has been instrumental in developing Valparaiso’s data science curriculum and establishing its vision and practice. Our conversation with Dr. Schmitt provides a glimpse into how a small, private Christian university without the resources of UC Berkeley or MIT is forging its own unique approach to preparing students for careers in data science fields.
Valparaiso University, also known as “Valpo”, is a private Lutheran college in northwest Indiana, just an hour east of Chicago. It is a small campus with just over 3,500 students. With its low faculty to student ratio of 11:1 Valpo was ranked No. 1 in undergraduate teaching in the MidWest by US News and World Report in 2018. Its College of Engineering is also ranked the 13th best undergraduate engineering program in the country.
How Valpo's Data Science Curriculum Was Developed
Given his experience working at the intersection of math and computer science, Dr. Schmitt had an initial idea of the concepts that he wanted to incorporate into a data science degree program, such as extensive hands-on learning. The curriculum that the department eventually put together fulfilled his goals while largely following recommendations from national workshops, reports, academic papers and Dr. Schmitt’s understanding of how data science and computer science mixed. As specific courses were developed, examples and material were from top ranked universities such as Stanford, Brown and UC Berkeley that are available on www.teachingdatascience.org were incorporated.
Dr. Schmitt explained that initially the university did not need to develop many new courses to create its data science curriculum. However, the department recognizes that to stay relevant there is a need for new courses, some of which have already been launched (more on that later). Plans are in the works for others. Right now at Valpo, in the first semester, students majoring in data science take programming and statistics to obtain foundation skills. Then in the spring, students take a newly minted Introduction to Data Science course, a critical steppingstone to the remaining three years of studies that include 2 colloquiums and a capstone project.
In total, the data science curriculum requirements are on the lighter side (40 hours) compared to many other universities. Therefore, the Department of Mathematics and Statistics strongly encourages data science majors to use the extra credit space to pursue a double major, specializing in an area of application. Dr. Schmitt emphasized,
“Data Science is an applied field. You can’t just do data science in isolation. You need to have the language and context to talk about data science in a domain.”
To this effect, in addition to the core computer science, math and statistics courses, the data science major requires a mid-level domain-focused course. This course can be chosen from six application areas - meteorology, physics, economics, biology, psychology and political science.
The data science curriculum also aims to offer a variety of courses to help students meet their particular end goals. For instance, while an artificial intelligence course is offered and is certainly a hot topic, not every student needs to learn AI for his or her career goals. Course options also include other areas such as civil engineering, business and econometrics.
Because of this diverse course offering, the students in the data science program mix with students from fields such as engineering, secondary education and psychology in their undergraduate data science courses. Dr. Schmitt noted,
“This provides a lot of variety for interesting discussions.”
Students Tackling Problems That Matter
The newly created Introduction to Data Science course, according to Dr. Schmitt “takes learning in a very different direction.” The course content includes typical introductory data science material but the approach to learning is driven by Dr. Schmitt’s experience and knowledge of computer science education’s best practices for retention and success.
What this means in practice is that Dr. Schmitt designed his under-classmen Introduction to Data Science course around the philosophy that students should “tackle problems that matter”. His experience has shown that,
“Students learn best when they engage in real-time hands-on learning with things that someone cares about.”
To this effect, Dr. Schmitt continuously engages his and the university’s network to find organizations that will offer students real world projects to work on during course semester. In the past students have worked with organizations such as the US Geological Survey, a regional metro planning department, a regional gas station, Upbring and Cleo.
Not only do students learn best with this approach, but Dr. Schmitt has observed that,
“The reality is that so many organizations need help with their data. Students could work on canned problems, or, actual problems where someone would care about what they did.”
Although Dr. Schmitt admits that these kinds of hands on projects at such an early stage of the data science learning process are difficult to manage, the rewards are great. Students of the course are sometimes offered summer internships with the companies whose projects they work on. A recent graduate has even received job offers and graduate school positions based on his data science project.
Now, going into its 4th year of iteration, the process of managing these hands-on projects is getting easier. Dr. Schmitt has observed that although it is a bit difficult to manage students who are new to the field on real world projects, he has learned to address these difficulties in two ways:
- Leverage the diverse makeup of the students in class – In the Introduction to Data Science course there are students from all years and from several other degree programs. He can leverage students who have a bit more experience to help support others.
- Don’t guarantee results – Dr. Schmitt provides no guarantee of results to the companies who participate in projects. He emphasizes rather,
“The process is important, rather than results.”
That said, with a few years under Dr. Schmitt’s belt now, student projects are experiencing a higher success rate and this is expected to continue as the course evolves. To date student projects have merited a 60-70% success rate, with “success” being defined as not a total and complete failure. Considering that Gartner once stated that 85% of corporate big data projects fail, it seems that the Valpo students are doing pretty well!
Incorporating Valpo’s Lutheran Values into a Technical Curriculum
Valpo is a Lutheran university that aims to instill in its students a sense of service. Dr. Schmitt explains that it is difficult to explicitly infuse these values into technical courses because while some students may value the concept of service to community, not all do. Even fewer see the direct connections of mathematics or programming to philanthropic service.
However, Dr. Schmitt views many of the projects that the students work on “as service” to a community of organizations that need assistance with data science. This includes many non-profits and regional government agencies, which provides at least one avenue for pursuing the university’s values. In addition, teaching students to look at bias and ethics in their data science pursuits and to entertain differing perspectives will help them to think about what it means to purse a multi-faceted view of “truth”.
Dr. Schmitt would like to continue incorporating the value of “doing meaningful work” into future curriculum design including colloquium and senior capstone projects.
The Future of Valpo’s Data Science Degree Program
More hands on learning
Dr. Schmitt’s first hope for the future of Valpo data science is that as more data science majors are exposed to hands-on learning projects early on, they’ll have the opportunity to continue working on these data science projects throughout their years at Valpo. This will help deliver even better results for clients and a more meaningful experience for the students. One successful example of this is a recent Meteorology student who worked on data from his project for two more years including publishing articles and presenting posters about his work.
Another hope for the real-world projects that Dr. Schmitt has is that these projects will find him. He explains that soliciting these projects is very time intensive. He spends a lot of time reaching out to his network to educate and encourage companies to participate and come up with projects. He explains,
“It’s challenging to find project sponsors because companies don’t want to share their data.”
He emphasizes that he and the students can and have previously signed non-disclosure agreements (NDAs) to get around this issue. Nevertheless, he still must remind companies about this option and is hoping that soon “...companies email me and say ‘Hey, we’d like to sponsor a project’.”
More cross pollination of student skills
In addition, being in the Midwest, home to many large insurers, Valpo has a strong actuarial science degree program. While the Actuarial Science program involves statistical analysis and math, it is currently disconnected from the data science program. However, this coming year Valpo will launch a change in requirements that mandates “Writing in the Disciplines” – a class that teaches students how to write acceptably in their relative disciplines. This course has led to a merging of colloquium courses, bringing together students from data science, actuarial science and other technical degree programs and thereby enabling cross-pollination of ideas and skills from related areas of study.
More Data Visualization courses
Another new curriculum development is the creation of two Data Visualization courses that are being offered for the first time this fall semester 2019. An interdisciplinary team of faculty from Art, Biology, Meteorology, Business, Computer Science, and Mathematics/Statistics designed both of these courses. In addition, the Introductory Data Visualization course is being co-taught by Dr. Elizabeth (Liz) Wuerffel, Associate Professor of Art.
Students will of course work on doing visual presentations from the work they are doing with local and regional organizations. In addition, they will think about issues such as cultural color bias and perspective. Merging this knowledge and the idea of service mentioned above has led to several interesting assignments such as examining signage around campus for accessibility and clarity. Dr. Schmitt says that the Art and Communication Departments have expressed a strong interest in expanding these courses into a full minor degree program around Data and Information Visualization.
More practical skills curriculum
Farther on the horizon, as the field of data science continues to evolve and as the “data science education conversation” evolves, Dr. Schmitt agrees with national recommendations that some fundamental courses will be need to be redesigned. Existing courses will not be able to meet new industry needs and or give students a more holistic picture of data science. For instance, he could envision bringing more master’s level courses into the undergraduate offering.
Dr. Schmitt also could see the need for practical classes on skills that currently universities assume students teach themselves. An example of this sort of course that he has been discussing with Dr. Nicholas Rosasco in the Computer and Information Sciences Department might be a course about how code is written and documented. Currently students learning to code must generally teach themselves how to read and write code documentation.
“Right now, students are just fumbling around trying figuring out how to read Pandas documentation on their own.”
observes Dr. Schmitt. As the need for more programmers grows in the field of data science, he could envision creating a course to teach this skill , emphasizing the practical written communication skills needed in data science and computer science, not just presentation or technical skills.
More integration with STEM subjects and other fields of study
Way out on the horizon, Dr. Schmitt envisions more integration with other academic fields that currently use and teach about data and parts of data science in a more siloed fashion. For instance, in the College of Engineering they teach about data “as needed”. In other fields of study such as biology, researchers may throw machine learning at projects, not really thinking about or teaching students to think about the underlying statistics. Dr. Schmitt is actively researching the existing state of cross-education as part of his National Science Foundation funded grant, Misconceptions in Data Science Instruction.
This problem isn’t unique to Valpo though, the integration of data science concepts in various fields like these is a challenge in the larger Data Science community admits Dr. Schmitt. It will take a while to get there, but when it does, Valpo will be ready with curriculum ideas to support it.
We'd like to extend a warm thanks to Dr. Schmitt and the entire faculty of the university departments mentioned in this article for their contributions. If you'd like to know more about Valparaiso's data science and related programs you can read about it here or check out this video.