Wednesday, August 10, 2022

Essay: My Love For Linear Regression, LASSO, And Business Education

Two years ago, during the Summer of 2020, I was enrolled in a master’s program at Georgetown University in management. It is a generalist management degree where students learn about the broader aspects of management. 


Topics like Finance, Accounting, Economics, and Supply Chain Management were covered, as well as Consulting, and Business Communication. I didn’t end up finishing the program; I withdrew due to COVID health concerns.

 

Yet, despite what by all accounts would have been a negative experience, there was one course that resonated with me. This course was Business Analytics. The subject of Business Analytics was a discipline that I have read about in Harvard Business Review, but I never had a concrete perspective on what exactly the topic was. 


The course itself, on a fundamental level, is a statistics course combined with R programming language. The more I learned in the course, the broader my perspective grew. The course is so much MORE than another math class; the skills learned in this course, can aid business managers in making high-impact decisions that are useful in organizations.

 

Now, this class was tough. Homework assignments were 3, no, 4 hours long. Students needed to assimilate the statistical material learned in class and apply them to R programming language. 


In fact, coding was the main component of the class. My instructor, Professor Jose, had one goal for the entire class. His goal was to facilitate the learning environment so that students would be able to run a linear regression model from scratch. This is a goal that is worthy of anyone learning analytics. And I am optimistic about learning even more advance methods for making predictions given any data set.

 

Since Georgetown, I have taken two sequential semesters of statistics in my home town, Miami (at FIU), to broaden my skillset, and also to acquire familiarity with the mathematical concepts that I had difficulty understanding while I was at Georgetown.


I am so much better at statistics, having done this. As a side note, I recommend continuing education at the college level during the evening, in conjunction with work during the day. You can use what you learn in the classroom immediately the next day in your work.

 

At this point, I am looking into internships, training and development programs at management consulting firms, and even research positions, that will help me acquire more analytical skills. 


I am now enrolled in a Business Analytics program at DePaul University. For the next year or so, I will be learning how to use data to make predictions about certain business questions and problems. I couldn’t be more excited.

 

Now that you are acquainted with my situation, I will get to the point of this essay—Multivariate Linear Regression.

 

Since the summer of 2021, I have been studying and learning R programming, and my experience couldn’t be more satisfying. There is something empowering about typing in a line of code and NOT seeing an error message. Those of you who have been frustrated with programming and coding will know what I am referring to.

 

At DePaul, I am learning how to code linear regression models, and there are two fundamental ways of model building that have caught my interest. 


The first way to create a linear model is to build it from the bottom up. That is, you pick a response variable, the variable that you wish to predict, and a predictor variables, the variables that impact the response. 


The method for standard linear modeling is straightforward, you begin with a response, and build the model with predictors that have a high correlation with that response. You then build out your model, adding a single predictor at a time, until you find predictors that explain the most uncertainty accounted for in an Adjusted R-Squared measurement.

 

As amazing as this method is for building multivariate linear models, there is another method that I have fallen in love with since learning of it. This method is known as a LASSO regression. This method is the opposite of the standard approach to linear regression model building in that the user can build their models from the TOP down. 


Programmers can use an algorithm that exposes ALL of the variables in a data set, which scale down their statistical correlations if they are insignificant. Analysts can then “feature select” which variables offer the highest statistical significance and use them to build an effective regression model. This regularization technique is especially useful if there are dozens, if not hundreds, of predictor variables in a dataset.

 

The LASSO regression algorithm is more in-depth, and though I have a basic understanding of it, I plan to make use of my time to run one from scratch. The more variables, the more satisfying, the more glory! 


I wish to use this brief essay to document my initial interests in regression analysis so that I can begin to acquire more advanced skills, and eventually, add value to an organization or business entity.

Share:
Location: Chicago, IL, USA