The recent March Madness college basketball tournament (one of the best in years, by the way) inspired me to take a closer look at the characteristics of successful college hoops teams. If you’ve been following along, you have already seen the journey from web scraping code all the way to colorful graphs. If you haven’t been in the loop, here’s a quick review + one last visual to finalize my review of the 2021 college men’s basketball season.

The Steps

A basic element of each step is pictured:

Scrape

Clean

Start making Plots

Double Down on Graph Quality

Check out my last post for those sharper looking visuals.

Final Graphic

There is always criticism that the tournament is a bad way to show the top teams and crown a champion. I usually agree with that statement, but 2021 seems like the top teams actually held up pretty well:

The highest statistically ranked teams (thanks Basketball References for creating the SRS metric!) seemed to be the top performers. Gonzaga was the highest ranked and they lost in the championship game, then Baylor (Champs) and Houston (final four team) were the next in line. This was fairly easy to deduce; the real power of this graph comes in seeing how the non top teams did. You can choose to read the graph from top right to bottom left to see SRS and winning %, or you can just read it from the top down to only see SRS.

Either way, you’ll see that the color scale on the legend *kind of* matches how the teams sorted out on the graph. For example, the last final 4 team (UCLA) trailed many teams that didn’t go as far as them, a couple of good teams lost in the first round, and one of the elite 8 teams (Oregon St) totally outperformed. On the other hand, the colors on the key generally match how the teams sorted out on the graph, so it wasn’t complete chaos among the lower teams.

(It’s important to note that SRS and winning % included regular & postseason games. So yes, if a team had a crazy run in the tournament their ranking and record would go up, but those metrics would still be most heavily influenced by their regular season performance. If it was just post season that affected the metrics, it would be a self-serving graph that didn’t tell us much.)

Let me know if the comments if you agree, and thanks for following along!

Link to code/github repo