Thursday, July 28, 2022

Stats Review: 6 Basic Terms That You Should Know For Data Science and Analytics

on July 28, 2022 in Blog, Statistics Terms

As Porn Stars, you don’t have time to get into the weeds of mathematics. You need to be able to search for information in an effective way so that you can learn quickly and efficiently.

In that vein, here is a list of statistical terms that every Sex Worker should know before she embarks on her new journey as data entrepreneur.

DESCRIPTIVE STATISTICS

First, we shall look at DESCRIPTIVE STATISTICS. So, what are they? Descriptive statistics are summarized statistics that describe the numerical trait within a given set of variables or data set. This of descriptive statistics as a closer look at what certain features mean within any set—or sets—of data.

Let’s get specific. Here is the list of descriptive statistics that any Porn Star should be familiar with, and then I will define them, simply:

MEAN

MEDIAN

MODE

STANDARD DEVIATION

QUARTILES

PERCENTILES

MEAN

We will begin with the MEAN. The mean—denoted as “μ” for population data (pronounced as “mew”… yes, like the cute Pokémon)—is the addition of all values within a feature or data set, divided by the total number of data values. What does that actually mean (pun intended)? Let’s say that we have a data set with the following numbers:

{2, 4, 6, 8, 10}

Simply add all the values together. Or,

2 + 4 + 6 + 8 + 10, which equals 30.

Then, divided that sum by the number of observations we have in the data set. There are 5 numbers in this set.

So, the mean of this data set is 30 divided by 5, which equals 6.

30 / 5 = 6

Additionally, the mean is commonly known as the “average,” and it is also referred to as the measure of the center point of the data.

MEDIAN

Let’s discuss the MEDIAN. The median is just one way to find the middle of the data that has been arranged in ascending order. Yet, unlike the mean, there is no division involved in calculating it UNLESS there is an even number of values in your data set. If there are even data points in your data set, you will take the two points in the middle, add them, and divide by two to find the median in your data set. What does this look like? I’ll explain.

If you had a data set with an odd number of observations—while ordering the data from smallest to largest—you can take the middle term of the data set, and that will be your median. That looks like this:

{2, 4, 6, 8, 10}

2 and 10 cancel out, 4 and 8 cancel out, and all you are left with is 6. Six is the median of this data set. As it happens, it is also the mean. But what if we had a data set with an even number of data points? Consider this data set:

{2, 4, 6, 8, 10, 12}

As we can see, there are 6 total numbers in this data set. Six is an even number. So, then we can cancel out 2 and 12, and we can cancel out 4 and 10, so that we are left with 6 and 8. Next we add 6 and 8 together, which equals 14. Then we divide that summation, 14, by two, which equals 7. Seven is our median for this data set.

It is important to note that when you are adding and dividing the two most middle data points (recall, the 6 and the 8), you are effectively averaging them together. Refer to the definition of mean, previously mentioned.

I would like to note that mean and median are NOT the same measurement and should NOT be used interchangeably. Hopefully, you can see that these two measurements are completely different, albeit, similar.

Before I continue with these statistical terms, I would like to calm your concerns, if you have any. If you take a typical statistics course, the data sets that you will be making calculations on will not exceed 20 data points. You will NOT be expected to order the data, unless the data set is smaller. You will be allowed to use a calculator, so you can make these calculations quickly. If you are taking a statistics course within the context of Data Science, however, you will be using computer software to assist you with the heavy lifting. Statistical software like R and SAS will typically be used, even Python occasionally, and depending on the class curriculum. So, fear not!

Let’s continue.

MODE

We can now discuss MODE. This term is relatively simply to understand. The mode is basically the value that appears the most frequently in the data set. Here is an easy example:

{1, 2, 3, 3, 4, 5, 6, 7, 7, 7, 8, 9, 10}

In the above example, the mode would be 7, because it the value that occurs the most frequently in the data set. Now, you may be wondering if it was possible for a data set to have many modes, and the answer is, ABSOLUTELY!

{1, 2, 3, 3, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10}

In the above example, 3 and 8 occur the most frequently. So, in this case we have TWO modes in our data set. This is known as a BIMODAL data set, or a set with two modes. We can also have TRIMODAL data sets, and MULTIMODAL for any data set with more than three modes.

RANGE

RANGE is next on our list, and this descriptive statistic has many more components that the previous terms. Let’s go over them now.

The range is the difference between the smallest value (known as the “minimum value”) and the largest value (known as the “maximum value”) in the data set.

The process for finding the range is straightforward. First, you arrange the numbers in your data set from smallest to largest. Then you take the largest number and subtract it by the smallest number. Let’s see what this looks like in the following data set:

{2, 4, 7, 15, 19, 23}

Using this process noted above, can you determine the range of this data set? If you found the answer to be 21, then you would be correct. Here is another look at the formula:

Largest number – Smallest number = Range

23 – 2 = 21

STANDARD DEVIATION

Let’s discuss the STANDARD DEVIATION quite basically, as the calculation for this descriptive statistic will be computed by programming and scripting languages like R, SAS, or Python. Let me reiterate, you will NOT be expected to calculate the standard deviation (or variance) without the use of software, UNLESS you are taking a traditional statistics course in school.

The standard deviation can be defined as the unit of dispersion (or “spread”) from the mean of a data set. The standard deviation—also known as σ, or “sigma”—can be calculated by taking the square root of the variance.

These two terms—standard deviation and variance—might seem abstract right now, but when we start creating prediction equations with Regression techniques, they will become clearer. Just know that these terms are tremendously important within the scope of statistics and data analytics. Keep them in the back of your mind for now.

Let’s continue.

QUARTILES AND PERCENTILES

QUARTILES divided the data set into four equal parts, much like the median splits the data in half. Each quartile, in effect, is roughly 25% of the data in four equal parts. There are 3 main quartiles of interest. This might seem confusing and counter-intuitive but let me explain.

Quartile Range

0 to 25% of the data exists within the lower quartile, or Q1. 25% to 50% of the data exists within the middle quartile, or Q2. 50% to 75% of the data exists within the upper quartile, which is Q3. It is important to conclude that the distance between the lower and upper quartile is known as the INTER QUARTILE RANGE, or IQR.

This image was sourced from mathisfun.com

PERCENTILES

Unlike quartiles, PERCENTILES divide the data into 100 equal parts, and these parts are represented as percentages.

THANK YOU!

This concludes my post on basic statistical terms that you should be familiar with before beginning your transition from the Sex Work industry into data analytics and analysis. I hope this content has been useful to you. And I know.

It seems that progress is slow, and to be fair, it is! But often, things that are the most valuable are ALWAYS a struggle to acquire. If true mastery was easy to obtain, we would ALL be masters, and NOTHING would be valued in this world.

You are all so courageous for undertaking this journey. I admire you all so much for wanting, and striving, to improve your lives. I am so grateful that you have included me in your transition. I will give you my best, always!

Share:

Location: Chicago, IL, USA