Two years ago, during the Summer of 2020, I was enrolled in a master’s program at Georgetown University in management. It is a generalist management degree where students learn about the broader aspects of management.
Topics like Finance, Accounting, Economics, and Supply Chain Management were covered, as well as Consulting, and Business Communication. I didn’t end up finishing the program; I withdrew due to COVID health concerns.
Yet, despite what by all accounts would have been a negative experience, there was one course that resonated with me. This course was Business Analytics. The subject of Business Analytics was a discipline that I have read about in Harvard Business Review, but I never had a concrete perspective on what exactly the topic was.
The course itself, on a fundamental
level, is a statistics course combined with R programming language. The more I
learned in the course, the broader my perspective grew. The course is so much
MORE than another math class; the skills learned in this course, can aid
business managers in making high-impact decisions that are useful in
organizations.
Now, this class was tough. Homework assignments were 3, no, 4 hours long. Students needed to assimilate the statistical material learned in class and apply them to R programming language.
In fact, coding was
the main component of the class. My instructor, Professor Jose, had one goal
for the entire class. His goal was to facilitate the learning environment so
that students would be able to run a linear regression model from scratch. This
is a goal that is worthy of anyone learning analytics. And I am optimistic
about learning even more advance methods for making predictions given any data
set.
Since Georgetown, I have taken two sequential semesters of statistics in my home town, Miami (at FIU), to broaden my skillset, and also to acquire familiarity with the mathematical concepts that I had difficulty understanding while I was at Georgetown.
I am so much better at
statistics, having done this. As a side note, I recommend continuing education
at the college level during the evening, in conjunction with work during the
day. You can use what you learn in the classroom immediately the next day in
your work.
At this point, I am looking into internships, training and development programs at management consulting firms, and even research positions, that will help me acquire more analytical skills.
I am now enrolled
in a Business Analytics program at DePaul University. For the next year or so,
I will be learning how to use data to make predictions about certain business
questions and problems. I couldn’t be more excited.
Now that you are acquainted with my situation, I will
get to the point of this essay—Multivariate Linear Regression.
Since the summer of 2021, I have been studying and
learning R programming, and my experience couldn’t be more satisfying. There is
something empowering about typing in a line of code and NOT seeing an error
message. Those of you who have been frustrated with programming and coding will
know what I am referring to.
At DePaul, I am learning how to code linear regression models, and there are two fundamental ways of model building that have caught my interest.
The first way to create a linear model is to build it from the bottom up. That is, you pick a response variable, the variable that you wish to predict, and a predictor variables, the variables that impact the response.
The
method for standard linear modeling is straightforward, you begin with a
response, and build the model with predictors that have a high correlation with
that response. You then build out your model, adding a single predictor at a
time, until you find predictors that explain the most uncertainty accounted for
in an Adjusted R-Squared measurement.
As amazing as this method is for building multivariate linear models, there is another method that I have fallen in love with since learning of it. This method is known as a LASSO regression. This method is the opposite of the standard approach to linear regression model building in that the user can build their models from the TOP down.
Programmers can use an algorithm
that exposes ALL of the variables in a data set, which scale down their
statistical correlations if they are insignificant. Analysts can then “feature
select” which variables offer the highest statistical significance and use them
to build an effective regression model. This regularization technique is
especially useful if there are dozens, if not hundreds, of predictor variables
in a dataset.
The LASSO regression algorithm is more in-depth, and though I have a basic understanding of it, I plan to make use of my time to run one from scratch. The more variables, the more satisfying, the more glory!
I
wish to use this brief essay to document my initial interests in regression
analysis so that I can begin to acquire more advanced skills, and eventually,
add value to an organization or business entity.