Data Challenge Scoring Recap - How'd They Do?

0

We’ve announced the winners and crunched the data from our recent university data challenge. We had great participation and want to thank all the university professors who encouraged students to register for the challenge. Following is a summary of overall score results and a bit more information on student participants.

University Data Science and Other Degree Programs Represented

660 students from 72 universities signed up for the Data Challenge.  They represented a wide range of educational institutions and geographies.  We saw particularly large participation from students at:

Columbia University

University of Cincinnati

University of Southern California

State University of New York at Buffalo

DePaul University

Northeastern University

University of Texas, Dallas

Washington University in St. Louis

Arizona State University

The majority (but certainly not all) of participants from these schools were enrolled in a computer science, data science or data analytics degree program.  In addition, it was encouraging to see a large variety of participants from lesser known programs such as Calvin University’s Bachelor of Data Science, University of British Columbia’s Bachelor of International Economics and Millsaps College’s Bachelor of Science in Math.  This range of backgrounds just confirms the notion that data science is everywhere!

Top Scoring Data Science and Analytics Programs

Data Science Competition WinnersThe top three universities with the highest average score were also the ones with the largest number of participating students.  Both Columbia and University of Southern California had participants from a diverse set of degree programs, while the University of Cincinnati largely had participation from students in its Master's in Business Analytics degree program. Congratulations to all participants from these top universities!

Data Exploration, Wrangling and Modeling Scores

Data Skills

The table below summarizes average scores and the score range for each of the three skill categories tested. Top scoring universities, that is to say those that had student participants scoring the highest score in the “range”, are named.

Skill Area Tested and Example Skill Sets

Average Score (out of 5)

Range of Scores

Top Scoring

Universities

Data Exploration

- data profiling

- descriptive statistics

- distributions

- correlations

- visualizations

3.3

.5 to 5

Boston University

University of Southern California

Columbia University

University of Cincinnati

 

Data Wrangling

- data ingest

- filtering

- cleansing

- formatting

- joins

- aggregation

- transformation

- feature engineering

- big data wrangling

3.2

0 to 5

UT Dallas

University of Southern California

University of British Columbia

Washington University St. Louis

University of Cincinnati

Georgia State University

Modeling

- hypothesis testing

- machine learning concepts

- machine learning architectures

- model tuning

- model evaluation

2.7

0 to 5

Columbia University

University of Southern California

University of San Francisco

 

Data Exploration Skills

It’s worth noting for data exploration skills that the majority of students scored in the mid-level range for data exploration skills around 3 to 3.5.  A good bit of data exploration is visualization and it’s likely that students are still coming up the learning curve on how best to address visualizations.

Data Wrangling Skills

It’s encouraging to see that so many university programs are producing students who are quite adept at data wrangling.  This would indicate that students are getting a fair amount of hands on experience with messy data – a skill that employers will definitely want to see!  We’re seeing this more and more as we speak to universities for our Quantcrunch Report series.

Data Modeling Skills

On the other hand, the average score for modeling skills compared to data wrangling and data exploration was quite a bit lower at 2.7.  We wondered why so we looked a little deeper at the data we have.  The relatively lower scores in modeling were non-discriminating, with most participating universities, even the top scoring universities like Columbia, having students who scored relatively lower in this skill area.  And while there is anecdotal evidence that students majoring in “peripheral” subjects such as finance and economics, or analytics (instead of computer science or data science) were somewhat more apt to score lower on modeling, most degree programs represented by this diverse set of participants showed some lower scores in modeling.

The two winners however, both scored very high in this data science skill category, which inevitably put them ahead of the pack.  High scores on modeling is also what drove Columbia University students as a group into first place for the highest average score.

A key takeaway here is that compared to data wrangling and data exploration, modeling requires a more advanced skill set.  Data wrangling involves manipulating data, which can get pretty hairy, but modeling involves feature selection, model training and hyperparameter tuning, and performance evaluation.

The moral of the story? Practice makes perfect! Students should get out there and practice modeling on real data sets! It’s also worth noting that we often see employers using a score somewhere around 3 or 3.5 as a “cutoff” for any particular skill area - so get some practice in!

Scores for Top Participating Universities

A small handful of universities represented the majority of university students participating in the data challenge. The tables below summarize the scores for the top 4 universities with the greatest number of participants.

% of All PARTICIPATION - Top 4 Universities

UNIVERSITY OF SOUTHERN CALIFORNIA

26%

COLUMBIA UNIVERSITY

22%

UNIVERSITY of CINCINNATI

11%

UNIVERSITY of TEXAS at DALLAS

4%

Other schools

40%

   

SKILL RANKINGS

AVG SCORE

Data Exploration

UNIVERSITY of CINCINNATI

3.8

COLUMBIA UNIVERSITY

3.5

UNIVERSITY of TEXAS at DALLAS

3.5

UNIVERSITY of SOUTHERN CALIFORNIA

3.3

 

 

Data Wrangling

UNIVERSITY of TEXAS at DALLAS

4.6

UNIVERSITY of CINCINNATI

3.7

COLUMBIA UNIVERSITY

3.6

UNIVERSITY of SOUTHERN CALIFORNIA

3.5

 

 

Modeling

COLUMBIA UNIVERSITY

3.8

UNIVERSITY of CINCINNATI

2.8

UNIVERSITY of SOUTHERN CALIFORNIA

2.8

UNIVERSITY of TEXAS at DALLAS

2.5

 

Columbia University drew participants from a wide variety of degree programs including its Master’s in Data Science, Master’s in Business Analytics, Master's in Quantitative Methods in Social Sciences and Masters’ in Operations Research.  The candidates from the University of Texas at Dallas almost all hailed from its Master’s in Business Analytics program. The same goes for students from the University of Cincinnati.  The University of Southern California drew from a wide range of programs including bachelor degree programs in computer science, economics, applied mathematics, and data informatics, as well as its master’s programs in applied data science and computer science and even spatial data science.

What does this variety in score outcomes and degree programs show? What everyone keeps saying

Data science is a team sport!

With a few rare exceptions, individual and groups of students in different degree programs have areas of relative strengths and weaknesses.  That's what employers need to take that into account when building data science teams.

The beauty of QuantHub is that employers (and candidates) can directly compare scores for different data-related skill sets across candidates just as these tables above do (with more detail). So if you’re building an advanced analytics team, who knows, you might for example, recruit a candidate from UT at Dallas who is in graduate school for business analytics, excels at data wrangling and is pretty good at data exploration. Then you might round out their skill set by recruiting an undergraduate in statistics from Columbia University who perhaps has solid modeling skills.

And don’t forget one of our data challenge winners doesn’t even hail from one of the top 4 participating universities (she was from University of San Francisco) and is a female Chinese national majoring in Bioinformatics.  It's a good reminder to employers look far and wide for data talent and for graduates to look far and wide for places to apply their skill sets!

Women in Data ScienceIf you’d like to know more about the winners behind the top Data Challenge scores, check out our interviews with Sophie Bair of Columbia University and Jie Han of University of San Francisco.

 

 

DataScienceDreamjobs

 

We've also compiled more profile information for participating students in our entertaining read, "Data Science Dream Jobs". Check it out!

 

 

 

Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)Icon_CMYK_Color (1)

Would you like to hear about future data challenges and receive information about QuantHub testing and data science thought leadership? Subscribe to our monthly blog digest!

Subscribe Now!

 

0