As Porn Stars, you don’t have time to get into the weeds of mathematics. You need to be able to search for information in an effective way so that you can learn quickly and efficiently.
In that vein, here is a list of statistical terms that every Sex Worker should know before she embarks on her new journey as data entrepreneur.
DESCRIPTIVE STATISTICS
First, we shall look at DESCRIPTIVE STATISTICS. So,
what are they? Descriptive statistics are summarized statistics that describe
the numerical trait within a given set of variables or data set. This of
descriptive statistics as a closer look at what certain features mean within
any set—or sets—of data.
Let’s get specific. Here is the list of descriptive
statistics that any Porn Star should be familiar with, and then I will define
them, simply:
MEAN
MEDIAN
MODE
STANDARD DEVIATION
QUARTILES
PERCENTILES
MEAN
We will begin with the MEAN. The mean—denoted as “μ” for population data (pronounced as “mew”…
yes, like the cute Pokémon)—is the addition of all values within a feature or
data set, divided by the total number of data values. What does that actually
mean (pun intended)? Let’s say that we have a data set with the following
numbers:
{2, 4, 6, 8, 10}
Simply add all the
values together. Or,
2 + 4 + 6 + 8 + 10,
which equals 30.
Then, divided that sum by the number of observations we have in the data set.
There are 5 numbers in this set.
So, the mean of this
data set is 30 divided by 5, which equals 6.
30 / 5 = 6
Additionally, the mean
is commonly known as the “average,” and it is also referred to as the measure
of the center point of the data.
MEDIAN
Let’s discuss the
MEDIAN. The median is just one way to find the middle of the data that has been
arranged in ascending order. Yet, unlike the mean, there is no division
involved in calculating it UNLESS there is an even number of values in your
data set. If there are even data points in your data set, you will take the two
points in the middle, add them, and divide by two to find the median in your
data set. What does this look like? I’ll explain.
If you had a data set
with an odd number of observations—while ordering the data from smallest to
largest—you can take the middle term of the data set, and that will be your
median. That looks like this:
{2, 4, 6, 8, 10}
2 and 10 cancel out, 4
and 8 cancel out, and all you are left with is 6. Six is the median of this
data set. As it happens, it is also the mean. But what if we had a data set
with an even number of data points? Consider this data set:
{2, 4, 6, 8, 10, 12}
As we can see, there
are 6 total numbers in this data set. Six is an even number. So, then we can
cancel out 2 and 12, and we can cancel out 4 and 10, so that we are left with 6
and 8. Next we add 6 and 8 together, which equals 14. Then we divide that
summation, 14, by two, which equals 7. Seven is our median for this data set.
It is important to note
that when you are adding and dividing the two most middle data points (recall,
the 6 and the 8), you are effectively averaging them together. Refer to the
definition of mean, previously mentioned.
I would like to note
that mean and median are NOT the same measurement and should NOT be used
interchangeably. Hopefully, you can see that these two measurements are
completely different, albeit, similar.
Before I continue with
these statistical terms, I would like to calm your concerns, if you have any.
If you take a typical statistics course, the data sets that you will be making
calculations on will not exceed 20 data points. You will NOT be expected to
order the data, unless the data set is smaller. You will be allowed to use a
calculator, so you can make these calculations quickly. If you are taking a
statistics course within the context of Data Science, however, you will be
using computer software to assist you with the heavy lifting. Statistical software
like R and SAS will typically be used, even Python occasionally, and depending
on the class curriculum. So, fear not!
Let’s continue.
MODE
We can now discuss
MODE. This term is relatively simply to understand. The mode is basically the
value that appears the most frequently in the data set. Here is an easy
example:
{1, 2, 3, 3, 4, 5, 6,
7, 7, 7, 8, 9, 10}
In the above example,
the mode would be 7, because it the value that occurs the most frequently in
the data set. Now, you may be wondering if it was possible for a data set to
have many modes, and the answer is, ABSOLUTELY!
{1, 2, 3, 3, 3, 4, 5,
6, 7, 8, 8, 8, 9, 10}
In the above example, 3
and 8 occur the most frequently. So, in this case we have TWO modes in our data
set. This is known as a BIMODAL data set, or a set with two modes. We can also
have TRIMODAL data sets, and MULTIMODAL for any data set with more than three
modes.
RANGE
RANGE is next on our
list, and this descriptive statistic has many more components that the previous
terms. Let’s go over them now.
The range is the
difference between the smallest value (known as the “minimum value”) and the
largest value (known as the “maximum value”) in the data set.
The process for finding
the range is straightforward. First, you arrange the numbers in your data set
from smallest to largest. Then you take the largest number and subtract it by
the smallest number. Let’s see what this looks like in the following data set:
{2, 4, 7, 15, 19, 23}
Using this process
noted above, can you determine the range of this data set? If you found the
answer to be 21, then you would be correct. Here is another look at the
formula:
Largest number –
Smallest number = Range
23 – 2 = 21
STANDARD DEVIATION
Let’s discuss the
STANDARD DEVIATION quite basically, as the calculation for this descriptive
statistic will be computed by programming and scripting languages like R, SAS,
or Python. Let me reiterate, you will NOT be expected to calculate the standard
deviation (or variance) without the use of software, UNLESS you are taking a
traditional statistics course in school.
The standard deviation
can be defined as the unit of dispersion (or “spread”) from the mean of a data
set. The standard deviation—also known as σ, or “sigma”—can be calculated by
taking the square root of the variance.
These two terms—standard
deviation and variance—might seem abstract right now, but when we start
creating prediction equations with Regression techniques, they will become
clearer. Just know that these terms are tremendously important within the scope
of statistics and data analytics. Keep them in the back of your mind for now.
Let’s continue.
QUARTILES AND
PERCENTILES
QUARTILES divided the
data set into four equal parts, much like the median splits the data in half.
Each quartile, in effect, is roughly 25% of the data in four equal parts. There
are 3 main quartiles of interest. This might seem confusing and counter-intuitive
but let me explain.
0 to 25% of the data
exists within the lower quartile, or Q1. 25% to 50% of the data exists within
the middle quartile, or Q2. 50% to 75% of the data exists within the upper
quartile, which is Q3. It is important to conclude that the distance between
the lower and upper quartile is known as the INTER QUARTILE RANGE, or IQR.
This image was sourced
from mathisfun.com
PERCENTILES
Unlike quartiles,
PERCENTILES divide the data into 100 equal parts, and these parts are
represented as percentages.
THANK YOU!
This concludes my post on basic statistical terms that you should be familiar with before beginning your transition from the Sex Work industry into data analytics and analysis. I hope this content has been useful to you. And I know.
It seems that progress is
slow, and to be fair, it is! But often, things that are the most valuable are
ALWAYS a struggle to acquire. If true mastery was easy to obtain, we would ALL
be masters, and NOTHING would be valued in this world.
You are all so courageous for undertaking this journey. I admire you all so much for wanting, and striving, to improve your lives. I am so grateful that you have included me in your transition. I will give you my best, always!