After 3 years on the job I’m studying for a Master’s degree in Data Science at UT Austin. Is it going to be worth it? Will it be manageable alongside work and other hobbies? Your guess is as good as mine.
How did I get here?
The first 3 years of my career leading to my current role as a senior data scientist have been self-taught. I took a job as a business analyst in pursuit of a faster-paced work environment and the greater ownership of complete products that a more digital-focused role allowed over the mechanical engineering field I went to school for. Almost immediately, I found myself surrounded by people working on the types of projects I had previously only experienced in buzzword form - huge machine learning models making predictions using big data, supplying insights to other cloud-hosted microservices. Puttering along at a neighboring desk working on my own projects in excel wasn’t going to cut it for long.
Before long, I was trialing every online data science class I could find online. I went from knowing the bare minimum about simple linear regression to a functional understanding of dozens of advanced modeling approaches, from a Computer Science 101 level of understanding of Python to a trusted resource for training other team members on pandas and scikit-learn. More recently, I’ve been building call center optimization models from the ground up - advising on test design to gather unbiased training data, iterating through numerous model designs and approaches, and producing final production-grade pre-processing and prediction pipelines.
All of this came without setting foot in a classroom and could have been entirely free if I was willing to do a bit more legwork vs. forking over a few bucks to DataCamp for their self-paced curriculum. The miracle of the democratization of learning did come with one notable side effect, though: impostor syndrome. I’m an outlier on my team, missing an advanced degree or even undergraduate exposure to anything but elementary statistics. Combine that with the intervening years dulling the memories of test anxiety and tedious homework and it was a perfect storm for me to head back to school. Plus, with all of these new online programs at Georgia Tech, UT Austin, and others all conveniently priced right at the average level of tuition reimbursement offered by most companies (including mine), what could go wrong?
Semester 1
The UT MSDS program has only been around since the spring of 2021, so my start date in the fall of that same year was only the 2nd full semester. On top of some inevitable ongoing logistical and technological teething issues, that meant the course offerings were fairly limited for my start date. Instead of wading back into the land of education slowly with a couple of easy electives, the schedule forced me to rip off the bandaid and get right into the core curriculum with ‘Probability and Simulation’ and ‘Machine Learning’.
Machine Learning
Ripping off the bandaid is one thing, but the Machine Learning bandaid was made of duct tape and sandpaper. Coming off of a 3-year hiatus from anything resembling a lecture or homework, the first 20 hour problem set was a huge shock to the system. After that first assignment, things were much more manageable, but it still left a bit of a ‘weed out class’ taste in my mouth that brought back memories from gems like ‘Dynamic Systems’ in my mechanical engineering days.
Unpredictable workload aside, though, it was great to get a theoretical understanding of techniques like Decision Trees, PCA, Kernel Methods, and more, that underpin so much of practical data science work. Even though 99% of the time the theory can be obfuscated and forgotten behind import sklearn
, it’s nice to know what’s going on under the hood for the cases where I need to abandon the pre-packaged solution in favor of something more custom.
I still probably could have died happy without manually training a decision tree on paper, but hey, try anything once…
Probability and Simulation
Probability and Simulation was closer to what I expected. Probability is always a fun subject that feels more like a series of brain-teasers than schoolwork, but the real joy of this class was the simulation component. After various elementary statistics classes in high school and college, I’ve been able to carry out and interpret hypothesis tests and build confidence intervals by dutifully following formulas but never really built and understanding of how they work.
Carrying out hypothesis testing through hands-on simulations of sampling distrubitions and confidence intervals was massively helpful for my practical mind to build better intuition about how they work. Even more crucially, though, it built an intuitive understanding of the assumptions relied upon by the traditional formulas. These are assumptions glossed over by many classes and tutorials but are to be ignored at great peril. I’ll be able to go forward as a much more responsible and informed statistical practitioner thanks to this approach.
Findings and Looking Forward
For all of the scheduling difficulties and dreaded post-work lecture sessions, my first semester in the UT MSDS program has been enormously rewarding. Taking classes while employed adds a lot to each already-busy week, but a steady paycheck already in hand takes a lot of the pressure off of a neurotic academic perfectionist like myself. Do I need to stay up doing busy work for a concept I already use in my day job? Or should I take the late penalty and put that energy towards truly novel concepts? When the learning is truly just for myself and devoid of any nebulous pressure like ‘will my parents be proud of me?’ or ‘will this be enough to get me a good job?’, the answers to those questions become a lot more obvious.
Diving into theory has also been a refreshing contrast to the comparatively harsher world of data science in a production environment. No meetings to run, no stakeholder buy-in to negotiate, no political angling to get a new model revision on the release schedule - just me and my pencil, trying to wrap my brain around eigenvalues for the millionth time.
2 classes felt manageable in the fall, even with a full schedule of bike racing taking up many of my weekends. That lightens up over the winter and into the spring, so I’m attempting 3 classes next semester: ‘Data Structures and Algorithms’, ‘Data Visualization’, and ‘Regression and Predictive Modeling’. How will I fare? Check back here in May to find out.