Thursday, July 28, 2022

Statistics and Machine Learning: Why They are Important and Why You Should Care!

Thank you so much for continuing to use my website as a resource for your professional and entrepreneurial development. I am so happy that I can write about what I love—data analytics and machine learning—for those who can stand to benefit from it.

 

In a world that can be so harsh and cruel, I hope that the teaching of this subject will help level the playing field between those who are in power, and for those who seek to rise within or outside a toxic environment.

 

You ladies are incredible!

 

Ok, so for today’s post, I would like to discuss why the subject of statistics is so important, and how it is used in the new world we live in, where humans and machines are coexisting.

 

WHY?

 

Why do we use statistics in data analytics and machine learning? What is the primary goal of this field of science and mathematics?

 

Well, many who have contributed to this field argue—quite fervently—that the goal is to make an inference about a population based on data that was collected on a sample of that population. It is difficult to obtain complete data on any given population; arguably, it is impossible to do so.

 

And for this reason, mathematicians and scientists created a form scientific process and inquiry that allows for the collection and computation of data on a smaller segment of the population, the sample, to make predictions and even verify assumptions of the larger population.



Statisticians are careful about defining their predictions and analysis by pre-determining a MARGIN OF ERROR in their analysis. They understand that there is NO CERTAINTY to making predictions, and this margin of error allows for them to make predictions about a population with a degree of confidence.

 

Scientists—and now the new age data scientists—have been attempting to infer certain attributes about entire populations since the beginning of mathematics, and there are reasons why the process of data collection can be so cumbersome; data is difficult to collect and can be presented in a form that is messy and almost unusable.

 

 

What have these scientists been trying to predict by using statistics?

 

1. They have been trying to predict future outcomes and returns for financial investments.

 

2. Data Scientists and healthcare professionals have been using analytics and machine learning to predict various traits within the healthcare value chain. Such traits include health insurance claims, number of emergency room visits, cancer rates, and how to use myriad variables within data to improve patient health outcomes.

 

3. Marketing professionals use analytics—by Google and HubSpot—to measure and validate user behaviors, and preferences on websites.

 

4. Software engineers have been able to collect data for the purpose of predicting the reliability of software in production. Further, they can use this data to determine whether to continue building a product or customer experience.

 

5. Economists also use statistics to predict both economic recovery and recessions given a multitude of factors. In the past, such factors have included housing costs, the unemployment rate, levels of education, and consumer activity.

 


As you can see, there are many industries that use statistics and machine learning. But what is “machine learning?” And how is it related to statistics?

 


Think of machine learning as the technology component, and statistics as the mathematical component. Presently, Machine Learning will often use sophisticated computer software and programming languages to spit out predictions. Statistics is the mathematical language that is used for machine learning algorithms. So, they are both intertwined.

 


I will continue to bore you at this point, with more statistical definitions that will be useful to know. They will help you interpret certain parameters within your projects and analysis, and they are helpful to know when conversing with your supervisors at work, or even laypersons who are not well-versed in the subject.

 


POPULATION and PARAMETER

 


As I mentioned before, the goal of statistics is to make inferences about a population of interest? But how is population defined? Simply, a population is a large (very large) collection of individuals, objects, or entities. American citizens, millennial students, or even dogs can be considered populations.

 


A “population parameter” is a number like a mean or even percentage, that describes the population.

 


SAMPLE and STATISTIC

 

In many, if not all, cases, it is impossible to collect information about an entire population. So, information from the sample of the population is collected, and a sample is a smaller group represented from the population.

 


A “sample statistic” is a number like a mean or a percentage, that describes the sample.

 


DON’T GET IT TWISTED!

 

When I think of the difference between parameter and statistic, I recall that parameter begins with the letter “p,” just like population. Statistic, on the other hand, begins with the letter “s,” just like sample. So, think of it like this:

 

A Parameter refers to a Population, P for P.

A Statistic refers to a Sample, S for S.

 


OTHER GOALS FOR STATISTICS

 

As I mentioned previously, the primary goal of statistics is to make an inference about a population of interest. Other goals are to test hypotheses, and to draw conclusions concerning the correlation between observed factors of interest.

 

The main bulk of the content for this website will be how you can run a Full Linear Regression Model, as it is the most common method for making predictions. EVERYONE uses this method! So, I will focus more on that. 

 


THANK YOU!

 

So, this will be the last boring post… I hope! I want to get into R programming and Regression analysis right away, so that you can get your feet wet, and begin your journey officially as data analysts or entrepreneurs. However, there will be many supplemental posts that offer explanations of how to interpret specific outputs of the regression, and yes, those posts will be boring. Yet, they are completely necessary!

 


You ladies are the best! Thank you for giving me the chance to help you with your transition, and I look forward to your journey moving forward in this exciting world of data analytics and machine learning.
Share:
Location: Chicago, IL, USA