Friday, July 29, 2022

Running Your First Bit of Code, Installing Packages, And Creating a Folder for Your Working Directory

Ok, so here is my first post before we begin our Linear Regression, and it is going to be SUPER basic. We will be running our first line of code, and installing some “packages” that we will need for doing analytics-based project in R studio.  

 

This post assumes that you have installed both R (the programming language) and R Studio (the programming interface). We will be programming in R Studio as it is easier—and prettier—than using the R tool.

 

If you haven’t installed both on your computer, I will give you a link to my earlier posts for installing properly.

 

Install R and R studio, click here for post.


Once that is finished, you may restart your computer, and log back into your home screen. After you have logged back into your home screen, click on the START menu, then in the search bar, type in “R Studio.” Then double-click on the R Studio icon.

 

You will be brought to a home screen that is split into four sections:

 

The Upper Left is where you will be typing your code.


The Upper Right is where you will see the variables (also known as “objects”) that you create.


The Lower Left is known as the Console, and it is where you will see the “output” of your code.


The Lower Right window is where you will see the graphs that you create.

 

Before we write our first snippet of code, let’s load a new project. Look at the upper left corner of the screen where it says “File.” Click it. Scroll down to “New Project” and select “R Script.” And BOOM! R will load a new project file for you which you can save and rename. We will do this later.


Let’s write your first bit of code! Everyone does this when they begin learning how to code for the first time, but we are going to write a different bit of code.

 

Type the following into the Upper Left Window:


First Line of Code

Never mind the text above the “print” statement. That is a “comment.” It is not part of the code. Any text after a “#” sign is considered a comment. R will ignore it when it is making its computations. Comments are to be used as a guide, to remind  you and others what your code means.

 

I included the comment to help you run the code. Type everything below, highlight the code with your mouse by left-clicking and selecting the code, and finally, press CTRL and ENTER to run the code. Notice that “Analytics for Porn Stars” is printed below in the LOWER LEFT window.

 

In essence, you can type ANYTHING you desire within the set of “” marks. Give it a try. Be sure highlight your new piece of code, and then press CTRL and ENTER to run your new code. Check the results in the LOWER LEFT WINDOW of R Studio.  

 

This is your first line of code, and you should be proud! It signifies that you are taking steps in your lives to make significant changes. Out with the bad, and in with the good! Your new lives, in a less toxic world, begin now!

 

Ok, there are three more lines of code that we need to run before getting started in our Regression Analysis.

 

We need to “install” a package that will be super useful for our projects. It is called “Tidyverse,” and it has pretty much everything we need from pragmatic functions to data visualizations.

 

This is the code that you need to run to install Tidyverse:


Installing Packages


I should warn you that R is VERY sensitive. So be sure to type in the same way that I have typed above. Use lower case and upper-case letters appropriately.

 

So, once you run the code, R Studio will magically download the tidyverse package (you MUST have an internet connection for this).

 

And once you have successfully installed the package, you will never need to run this command again. However, you will need to “load” the package into R Studio. And I will show you how to do this now:


Library loading Tidyverse


Every time you load a new project, you WILL need to use this command to load necessary packages into R Studio. As we get further into this blog, I will show you which packages will be necessary for our use case.

 

Finally, you can minimize out of R Studio by clicking on the “_” button on the UPPER RIGHT CORNER of your screen.

 

We are now going to create a folder that you will use when you are doing your R projects. This is important because you will need to use this folder when you “set your working directory.” By setting your working directory, R Studio can run the code more efficiently and find files on your computer much faster.

 

If you are using Windows 11 (or even Windows 10) operating system, you should be able to see a folder-looking icon on the bottom panel of your screen. Left-mouse click it.

 

Folder Icon

On the left side of the new window, you will see an icon called “Documents.” Left-mouse click it.

 

Next, on the upper left part of the window, you will see a “+” icon called “New.” Left-mouse click it and select “Folder.” Name the new folder anything that is sensible and that you can remember. Naming the folder “R Projects” might be such a name. There ya’ go!

 

I will show you how to “select your working directory” in the next post, where it is more relevant.

 

You can right-mouse click on the R Studio icon on the lower panel and click “Close All Windows.” This will allow you to exit R Studio.

 

THANK YOU!

 

Ok, so that is your first—very small—introduction into R Studio and coding. This is a paramount first step in your journey, and I am so excited for you. Continue to make healthier decisions in your life that will lead you to less toxic outcomes in your social life, your professional life, and even your love life.


I admire you all so much for making this transition. You are all very brave! Thank you so much for allowing me to write educational content for you. Enjoy the journey!


Share:

Thursday, July 28, 2022

Transitional Careers for Porn Stars: 7 Companies and Industries That Pay Well for Those Who Have Data Analytics Skills

Before we tackle Linear Regression, I would like to highlight 7 Companies that pay well for those who have quantitative backgrounds. I will be looking at companies that are hiring for “Business Analysts” or “Data Analysts,” as these are the roles that are easier to acquire compared to other Data Science and Computer Science roles.

 

The Business Analyst and Data Analyst roles are typical in most business organizations, and as I mentioned, the requirements are less stringent. However, that doesn't mean that these roles are easy to fill. They are quite competitive, and all of them require a college degree, at least a Bachelor’s degree.

 

Despite this, if you are looking to transition from the Sex Worker industry, these roles should be at the top of your list. And, if you do decide to acquire a college degree, these roles are within your reach. Nothing is impossible for you!

 

Yes, the odds are stacked against you because of your background, experiences and profession. With this in mind, you should not limit yourselves to shit roles. “Shoot for the stars, and you’ll reach the moon,” as it were.

 

Keep in mind that these are entry level positions, so there is room for upward mobility within the organization. It typically takes anywhere between two to three years to get a promotion, perhaps sooner, if you are a high performer.

 

Without further delay, here is the list:

 

1. MANAGEMENT CONSULTING, MCKINSEY AND COMPANY, BUSINESS ANALYST

 

If you are interested in becoming a consultant at a top firm, McKinsey & Company has few equals. The hours are long, and travel is a requirement for the job, but if you can manage, the entry level salary is $88,471 per year. Remember, this is for your first year out of college. Not bad!

 

Other Management Consultant firms worth noting are Boston Consulting Group (BCG), Accenture, Bain & Company, and Slalom.

 

 2. AVIATION AND TRAVEL, ALASKA AIRLINES, BUSINESS INTELLIGENCE ANALYST

 

If travel and hospitality is your speed, then being a Business Intelligence Analyst at Alaska Airlines might be more favorable as it is a less hectic role than at a business management consulting firm. The average entry level Business Intelligence Analyst at this organization brings in $85,690 each year.

 

Hawaiian Airlines, American Airlines, and United Airlines are other domestic companies to consider for the Business Intelligence Analyst role.

 

3. TRANSPORTATION, THE CHICAGO TRANSIT AUTHORITY

 

Even cities need analysts to manage operations of their transportation companies. The Chicago Transit Authority (CTA) is the organization that runs all of the public transportation in the city, from buses to trains. Data Analysts use statistics to determine optimal and even new routes within the city, how to optimize fuel, cut back on costs, and boost profits.

 

The CTA has entry level roles called Data Analysts, who can be involved in anything from finance to asset management. The typical salary for this type of role is $80,303 per year.

 

Do a Google search for any “transportation and logistics company for (ANY CITY),” and you will find the specific organization that operates in your city of interest.

 

4. GOVERNMENT AGENCIES, CENTRAL INTELLIGENCE AGENCY

 

Now, don’t get discouraged with this suggestion. The Central Intelligence Agency (CIA) is a forward-thinking government agency that encourages talent from all walks of life into their organization.

  

They understand that not everyone is perfect, and as long as you are not participating in illegal activity—drugs, trafficking and prostitution—then applying to the CIA might be a great choice for you.

 

Besides, working for the CIA can be lucrative. Data Scientists at the CIA can make $89,525 per year, but given the range, and the level of experience that you bring to the role, you can make more. The range is from a low of $77,554 to a high of $149,345 per year. Hot-diggity-dawg!

 

The salary isn’t even the best part about working for the CIA. The organization recruits employees who represent the highest standards and are often considered to be the “best of the best.”

 

The caliber of people you will be around—and the general high level of character—will only benefit your healthy transition into a less toxic environment. You will meet others who are ambitious, supremely intelligent, and just downright decent.

 

Now, it is important to note that government agencies do run background checks and will screen you for drugs. So, this procedure can be a barrier for your if you are having trouble getting your life on track. Overcoming unhealthy drug addictions is beyond my imagination, and for those that need to, you have my admiration.

 

But for those who are further along in their transition, being employed at a government agency is not only good for your health, it is good for your personal brand. As I mentioned before, the best of the best work for the government, so you will be constantly pushed by your colleagues to improve as a professional and as a person.

 

Lastly, I want to mention that the US government has a two-year rotational development program that pays very well, and offers you middle-management position in ANY participating government agency once you finish with the program.

 

It is called the Presidential Management Fellowship (PMF). Click this link here for more information. And depending on which city you work, you can get paid quite well.

 

For instance, the potential salary for those who complete their PMF in the DC metropolitan area ranges from $61,947-$116,788 per year (GS9 to GS12 designation).

 

5. SPORTS, LOS ANGELES LAKERS BASKETBALL TEAM

 

Now, Business Data Analysts for the LA Lakers don’t get paid as high compared to the previous roles that I have discussed, but if you love sports and data analytics, this might be a good option. The typical salary range for this role is from $67,882 to $73,820.

 

The LA Lakers is a reputable organization, and the sky is really the limit when it comes to Athletic organizations. Just in the United States alone, there are 152 major sports franchises embodied by the MLB, NHL, NFL, NBA, and the MLS. That’s a lot of sports teams! So if you are a “sporty-kinda-gal” then this industry might be your calling.

 

Go on Google and search “Professional sports teams in (CITY OF INTEREST)” and look for “careers.” Google has its own platform for searching through careers at just about any organization you can think of, and this is no different for sports.

 

6. HOSPITALITY AND LEISURE, CARNIVAL CRUISE LINES  

 

Carnival Cruise lines has a Business Intelligence Role that pays roughly $71,118 per year. This may not seem like much to begin with, but when you consider all the perks that come with this role, it seems to be worth it.

 

Perks at Carnival Cruise Lines are Health Care and Insurance, Paid Time Off, Stock Purchase Plan, Adoption Assistance, a 401(k), Active-Duty Military Benefits, Cruise and Travel Benefits, Retails Perks, and Flexible Work Scheduling.

 

For employees to receive the full advantage of organization benefits, it usually requires a year of full-time employment.

 

By why stop at cruise lines? Business Intelligence Analysts, and the like, have similar perks and advantages while working for large hotel chains like Hilton Hotels. So, don’t end your search with cruise lines.

 

7. CITY EMPLOYEE

 

You might ultimately decide to be a Data Strategist, Research Data Analyst, Finance Analyst, and even Innovation Specialist for a city here in the United States. The initial roles don’t pay that much, but because city organizations are relatively large, there is ample opportunity for upward mobility within the organization.

 

Minimum requirements for a Chief Data Officer in the City of San Francisco is anywhere between 6-8 years of relevant management experience, and the salary is high; the salary averages at $115,392 each year.

 

WAIT, THAT’S IT?

 

Now, I know some of you might be thinking—especially the high roller exotic dancers—that you make more than these salaries in a year. This may be true, but the lifestyle of working a corporate job is less toxic and less dangerous than the Sex Worker industry. Consequently, the people you will meet, and the relationships you will form working in a less toxic environment will be more rewarding as well.

 

 BASIC GOOGLE RESEARCH

 

Before I conclude this post, I would like to give some insight as to how you can find basic information about certain roles in specific companies that might interest you.

 

First, determine which company you wish to work for. Go their home webpage. Scroll to the bottom where it says “careers” or “jobs” to find a list of all job openings in that company.


Or you can do the “Google Approach.” Here is a useful formula:

 

Type in the Google Search Bar: “(COMPANY OF INTEREST) careers.” And that’s it.

 

Furthermore, if you see a role that resonates with you (like Data Analyst, or Business Intelligence Analyst) while searching in the company’s job openings list, you can find a rough baseline salary of that role using Google. Here is another formula:

 

Type in the Google Search Bar: “How much money does a Data Analyst make at (COMPANY OF INTEREST).”

 

At the top of the page in the search results, Google will provide a range or average of salaries that you can expect for that role in that specific company.


It’s not an exact number, but if you scroll down the search results, you will find other sites where employees have posted their salaries anonymously at sites like glassdoor, ziprecruiter, payscale, etc. Again, it’s not perfect, but this information will give you a sense of how employees for this type of role are paid.

 

THANK YOU!

 

I hope this post was helpful to you and can be a go-to source for you when you decide that it is time to begin your search for jobs. Some of the jobs pay more than others, and depending on the context, can also be less rewarding. So be sure to weigh the pros and the cons when doing your job search.

 

Thank you so much for using my website as a part of your transformation and transition into your new lives. I have said this once, and I will say it once more—You ladies are incredible, and courageous. Enjoy your transition into a healthier life. Your time is NOW!

Share:

Statistics and Machine Learning: Why They are Important and Why You Should Care!

Thank you so much for continuing to use my website as a resource for your professional and entrepreneurial development. I am so happy that I can write about what I love—data analytics and machine learning—for those who can stand to benefit from it.

 

In a world that can be so harsh and cruel, I hope that the teaching of this subject will help level the playing field between those who are in power, and for those who seek to rise within or outside a toxic environment.

 

You ladies are incredible!

 

Ok, so for today’s post, I would like to discuss why the subject of statistics is so important, and how it is used in the new world we live in, where humans and machines are coexisting.

 

WHY?

 

Why do we use statistics in data analytics and machine learning? What is the primary goal of this field of science and mathematics?

 

Well, many who have contributed to this field argue—quite fervently—that the goal is to make an inference about a population based on data that was collected on a sample of that population. It is difficult to obtain complete data on any given population; arguably, it is impossible to do so.

 

And for this reason, mathematicians and scientists created a form scientific process and inquiry that allows for the collection and computation of data on a smaller segment of the population, the sample, to make predictions and even verify assumptions of the larger population.



Statisticians are careful about defining their predictions and analysis by pre-determining a MARGIN OF ERROR in their analysis. They understand that there is NO CERTAINTY to making predictions, and this margin of error allows for them to make predictions about a population with a degree of confidence.

 

Scientists—and now the new age data scientists—have been attempting to infer certain attributes about entire populations since the beginning of mathematics, and there are reasons why the process of data collection can be so cumbersome; data is difficult to collect and can be presented in a form that is messy and almost unusable.

 

 

What have these scientists been trying to predict by using statistics?

 

1. They have been trying to predict future outcomes and returns for financial investments.

 

2. Data Scientists and healthcare professionals have been using analytics and machine learning to predict various traits within the healthcare value chain. Such traits include health insurance claims, number of emergency room visits, cancer rates, and how to use myriad variables within data to improve patient health outcomes.

 

3. Marketing professionals use analytics—by Google and HubSpot—to measure and validate user behaviors, and preferences on websites.

 

4. Software engineers have been able to collect data for the purpose of predicting the reliability of software in production. Further, they can use this data to determine whether to continue building a product or customer experience.

 

5. Economists also use statistics to predict both economic recovery and recessions given a multitude of factors. In the past, such factors have included housing costs, the unemployment rate, levels of education, and consumer activity.

 


As you can see, there are many industries that use statistics and machine learning. But what is “machine learning?” And how is it related to statistics?

 


Think of machine learning as the technology component, and statistics as the mathematical component. Presently, Machine Learning will often use sophisticated computer software and programming languages to spit out predictions. Statistics is the mathematical language that is used for machine learning algorithms. So, they are both intertwined.

 


I will continue to bore you at this point, with more statistical definitions that will be useful to know. They will help you interpret certain parameters within your projects and analysis, and they are helpful to know when conversing with your supervisors at work, or even laypersons who are not well-versed in the subject.

 


POPULATION and PARAMETER

 


As I mentioned before, the goal of statistics is to make inferences about a population of interest? But how is population defined? Simply, a population is a large (very large) collection of individuals, objects, or entities. American citizens, millennial students, or even dogs can be considered populations.

 


A “population parameter” is a number like a mean or even percentage, that describes the population.

 


SAMPLE and STATISTIC

 

In many, if not all, cases, it is impossible to collect information about an entire population. So, information from the sample of the population is collected, and a sample is a smaller group represented from the population.

 


A “sample statistic” is a number like a mean or a percentage, that describes the sample.

 


DON’T GET IT TWISTED!

 

When I think of the difference between parameter and statistic, I recall that parameter begins with the letter “p,” just like population. Statistic, on the other hand, begins with the letter “s,” just like sample. So, think of it like this:

 

A Parameter refers to a Population, P for P.

A Statistic refers to a Sample, S for S.

 


OTHER GOALS FOR STATISTICS

 

As I mentioned previously, the primary goal of statistics is to make an inference about a population of interest. Other goals are to test hypotheses, and to draw conclusions concerning the correlation between observed factors of interest.

 

The main bulk of the content for this website will be how you can run a Full Linear Regression Model, as it is the most common method for making predictions. EVERYONE uses this method! So, I will focus more on that. 

 


THANK YOU!

 

So, this will be the last boring post… I hope! I want to get into R programming and Regression analysis right away, so that you can get your feet wet, and begin your journey officially as data analysts or entrepreneurs. However, there will be many supplemental posts that offer explanations of how to interpret specific outputs of the regression, and yes, those posts will be boring. Yet, they are completely necessary!

 


You ladies are the best! Thank you for giving me the chance to help you with your transition, and I look forward to your journey moving forward in this exciting world of data analytics and machine learning.
Share:

Data Visualization Basics: 3 Plots and Graphs That Will Make Your Regression Analysis Project Easier

Thank you so much for your continued support and visiting my website. I hope that my content can be useful to all Porn Stars—or anyone and everyone in the Sex Industry—and it is certainly my honor creating a space for you all to access such a relevant and worthy field.

 

At this point, I expect that you read my previous post, “6 Basic Terms That You Should Know For Data Science and Analytics.” I will link that here. This post will go deeper into data science and analytics in that I will introduce to you some basic, yet necessary, graphs and charts that will be vital for your development as data analysts and data entrepreneurs.



I will be discussing the following content:

 

HISTOGRAMS, SKEWNESS and NORMAL DISTRIBUTION

BOX PLOTS

SCATTERPLOTS and CORRELATION COEFFICIENTS

 

Before, I discuss the content for this blog post. I would like to put the above terms in context with Regression Analysis.

 

MACHINE LEARNING AND REGRESSION ANALYSIS

 

Regression Analysis is a form of “supervised machine learning,” in that a machine will ultimately “learn” to make predictions based on input, or code, that humans program into the machine. It is very different from “unsupervised machine learning” in that humans are a necessary component for machine to “grow smarter.”

 

In “unsupervised machine learning” algorithms—we will learn about one of them called Clustering, later—machines are able to detect patterns in the data on their own. The only caveat is that machines need a tremendous amount of data.

 

Back to Regression Analysis.

 

Today’s blog post leans on data visualization as a tool for creating a prediction equation, or more formally known as a Regression algorithm. There are certain assumptions and checks that a data analyst or machine learning expert needs to conduct so that their Regression model (an equation with one or many variables) is appropriate to make predictions—the real magic of machine learning.

 

Charts and plots are an industry standard for checking these assumptions (more on this in a later post). For example, we will eventually be checking to see if there are ANY correlations (or relationships) between the response variable, the variable we are interested in making a prediction, and the explanatory variables, the independent variables that will impact the response variable.

 

In other words, though this post will be basic, there will be relevant information for you so that you can run a Regression algorithm AND create your own projects that you can show to potential employers. Regression is a common, yet POWERFUL, machine learning application that is used in the pretty much any industry where you need to make a prediction about ANYTHING!

 

So, graphs and plots are important tools! Let’s begin with Histograms.

 

HISTOGRAMS

 

A HISTOGRAM is a graph that depicts the distribution of numerical data. A histogram commonly comes in three variations, a symmetrical distribution, a right-skewed distribution, and a left-skewed distribution. Here is a symmetrical distribution, otherwise known as a normal distribution:


Normal Distribution of Histogram


We can see from the graph that it is perfectly symmetrical. This means that the mode, mean, and median are ALL equal, and are positioned in the center of the distribution—where most of the data is observed. This graph is also known to have a bell-curve.

 

To put this in context with Regression Analysis, our first task would be to determine if the distribution of the “response variable,” or the variable in which we wish to predict, is normally distributed. Ideally, we want our histogram of the response variable to look like this! More on this in another blog post.

 

So, if a histogram doesn’t have a normal shape, then what else can it took like?


Left and Right Skewness of Histogram

In the above graphic, we have both a Left and Right-Skewed distribution.

 

Left-Skewed distributions are known to have a “negative skewness,” and Right-Skewed distributions are known to have a “positive skewness.”

 

As you can see, the Left-Skewed graph has most of its data positioned on the right side of the graph. Think of a Left-Skewed distribution as a skier gliding down a mountainside (or slope) to the left. The opposite can be said of Right-Skewed distributions.

 

If you can recall, Normal or symmetric distributions have the mode, mean, and median, ALL equaling each other. Now, there is a useful relationship between these three descriptive statistics in that:

 

For Left-Skewed Distributions: The mode (or the peak of the data) is larger than the median, and the median is larger than the mean.

 

For Right-Skewed Distributions: The mode is smaller than the median, and the median is smaller than the mean.


But why is noting this useful?

 

Let’s say that you were analyzing employment report data for Harvard Business School MBA graduates. We might observe that the Median salary for post-graduates is $165,000, and that the Mean salary is $158,937. (These are hypothetical numbers).

 

Since the Median is larger than the Mean, we can conclude that the data is Left-Skewed—assuming it is unimodal. This might be an important deduction for deciding whether or not to pursue a Harvard MBA, as this information tells us that more students lean closer to the Mode post-graduate salary, which we know is greater than $165,000. From this, we can assume that there are many students make more than $165,000.

 

Knowing this, we can make an inference, as to the salary that we might make after graduating from Harvard’s MBA program, given that we have other factors that are common to most of the incoming cohort—like number of work years prior to entry, GMAT testing scores, Undergraduate GPA score, and industry relevant experience.

 

Just knowing the shape of the distribution can help us be CONFIDENT about these assumptions given other factors of note. We can never be certain though, and this is a downside to statistics and any form of science, in general.

 

Let’s discuss Boxplots.

 

BOXPLOTS

 

The Boxplot is also known as a “whisker” plot, and it is a graphical representation of the Five-Number Summary (using quartiles). 


Box Plot


There is a lot to unpack here, so please be patient with me. Let’s break this plot down beginning with the middle.


The yellow line in this plot is known as the median. Sometimes, depending on which software  you are using, there will also be a small diamond-shaped object near this line, and that shape represents the mean of the data.

 

The Red Box is enclosed by Q1 and Q3, which if you remember from the previous post, is known as the Interquartile Range (IQR).

 

The two purple lines are drawn outward to two more lines that come to an intersection. The intersection on the left (sometimes at the bottom) is called the “Minimum” value, and the intersection on the right (sometimes at the top) is called the “Maximum” value. These are the whiskers of the boxplot.

 

The Green dots on either end of the whiskers are called Outliers. These data points are special instances in which their values don’t represent the commonality of the data points in the data set. 


Let’s think about the average salary in the US, roughly $56,310 in 2020, according to the Bureau of Labor Statistics. An outlier would be someone like a corporate executive who made, perhaps, a whopping $2,500,000 that year.

 

This person is not typical! There are more who made closer to the average salary, than there are who brought in a seven figure salary.

 

This example represents a Right-Skewed distribution in which the mode, median, and mean are closer to the peak of the data. The outlier would be far out, residing in the tail of the data. There are less US citizens making salaries outside of where the peak is, where the tails are. Athletes are another example of outlier in salary data—they make more money than the average American citizen.

 

One last observation! The above boxplot has a Normal Distribution, and you can tell because the whiskers are equal in length (just eyeballing it). If the Right whisker is larger than the left, we have a Right-Skewed distribution. If the Left whisker is larger than the right, we have a Left-skewed distribution.

 

Let’s move on to Scatter Plots and Correlation Coefficients.

 

SCATTER PLOTS

 

A Scatter Plot basically depicts how much of one variable impacts another. Scatter Plots are useful in the beginning of a Regression Analysis when you need to determine whether the response variable (y-variable) has a “linear” correlation with any of the explanatory variables (x-variables). Regression analysis relies on these correlations, and they are necessary for the prediction equation to hold merit for the final Regression line.

 

Scatter Plots can have Positive, Negative, or No Correlation. Here is what that looks like:


Scatter Plots Directionality



A Positive Linear Relationship has an increase in the x-axis (horizontal axis) for every unit increase in the y-axis (vertical axis). It has an upward slope, pointing up and to the right.

 

A Negative Linear Relationship has an increase in the x-axis (horizontal axis) for every unit decrease in the y-axis (vertical axis). It has a downward slope, pointing down and to the right.

 

A graph with No Correlation cannot be distinguished between a positive or negative linear relationship. There is no pattern, and the data points appear to be scattered randomly.

 

For a Regression Analysis, you MUST have both Positive and Negative correlations between the response variable, and the explanatory variable.

 

Furthermore, correlations can be Weak, Moderate, Strong, or Perfect. Here is what that looks like:

 

Scatter Plot Strength


The Stronger the correlation, the MORE data points fit along—or are at least, close to—the Regression line (depicted by the red line). A plot with a Strong Positive Linear Correlation will look like the plot on the upper left. It will have a positive Correlation Coefficient value closer to 1 (more on this soon).

 

Similarly, there can exist a plot with a Strong Negative Linear Correlation, and that plot is represented by the one in the upper right. It will have a Correlation Coefficient closer to negative 1.

 

The Weaker the correlation, the LESS data points fit on or along the Regression line. You can observe a Weak Positive Linear Correlation, on the lower left, and a Weak Negative Linear Correlation, on the lower right. The Weaker the correlation, the closer the Correlation Coefficient is to zero.

 

Now that I have confused you with correlation coefficients, I will discuss them here in greater detail.

 

CORRELATION COEFFICIENTS

 

Correlation Coefficients are commonly denoted as “r” scores, and these scores range from negative one (-1) to positive one (+1).

 

These scores are measured between two variables, a dependent variable (y-variable, or response variable) and an independent variable (x-variable, or explanatory variable).

 

The score represents both the Strength and the Direction of the correlation.

 

There is no unified, or exact range, but here are some sensible metrics for you to begin describing your correlation coefficient. Keep in mind that Positive “r” scores represent a Positive Linear Relationship (recall when the graph is going up and to the right). Conversely, Negative “r” scores represent a Negative Linear Relationship (when the graph is going down and to the right):

 

1 would be Perfect Positive Linear Correlation

.8 would be Strong Positive Linear Correlation

.6 would be Moderate Positive Linear Correlation

.3 would be Weak Positive Linear Correlation

0 No Linear Correlation

-0.3 would be Weak Negative Linear Correlation

-0.6 would be Moderate Negative Linear Correlation

-0.8 would be Strong Negative Linear Correlation

-1 would be Perfect Negative Linear Correlation

 

Keep in mind that when you are comparing your x and y-variables with scatterplots that you want to see “r” scores of Moderate or higher (0.6 or higher, or -0.6 or lower), in any direction for an appropriate use of Regression Analysis. This ensures that there IS a relationship between the variables, and that Regression can be used to make predictions.

 

 

THANK YOU, AGAIN, and ALWAYS!

 

It is my pleasure and honor to continue to post about analytics for you in an accessible way. This is all very boring at the moment, and that’s ok, because the fun part is coming soon! Don’t get discouraged if the material seems ambiguous and complicated. Part of the beauty of the world wide web is that information is abundant and permanent. You can read these posts as many times as you wish for the content to stick. Enjoy them, and have fun with them!


Share:

Stats Review: 6 Basic Terms That You Should Know For Data Science and Analytics

As Porn Stars, you don’t have time to get into the weeds of mathematics. You need to be able to search for information in an effective way so that you can learn quickly and efficiently. 


In that vein, here is a list of statistical terms that every Sex Worker should know before she embarks on her new journey as data entrepreneur.

 

DESCRIPTIVE STATISTICS

 

First, we shall look at DESCRIPTIVE STATISTICS. So, what are they? Descriptive statistics are summarized statistics that describe the numerical trait within a given set of variables or data set. This of descriptive statistics as a closer look at what certain features mean within any set—or sets—of data.

 

Let’s get specific. Here is the list of descriptive statistics that any Porn Star should be familiar with, and then I will define them, simply:

 

MEAN

MEDIAN

MODE

STANDARD DEVIATION

QUARTILES

PERCENTILES

 

MEAN

 

We will begin with the MEAN. The mean—denoted as “μ” for population data (pronounced as “mew”… yes, like the cute Pokémon)—is the addition of all values within a feature or data set, divided by the total number of data values. What does that actually mean (pun intended)? Let’s say that we have a data set with the following numbers:

 

{2, 4, 6, 8, 10}

 

Simply add all the values together. Or,

 

2 + 4 + 6 + 8 + 10, which equals 30.


Then, divided that sum by the number of observations we have in the data set. There are 5 numbers in this set.

 

So, the mean of this data set is 30 divided by 5, which equals 6.

 

30 / 5 = 6

 

Additionally, the mean is commonly known as the “average,” and it is also referred to as the measure of the center point of the data.

 

MEDIAN

 

Let’s discuss the MEDIAN. The median is just one way to find the middle of the data that has been arranged in ascending order. Yet, unlike the mean, there is no division involved in calculating it UNLESS there is an even number of values in your data set. If there are even data points in your data set, you will take the two points in the middle, add them, and divide by two to find the median in your data set. What does this look like? I’ll explain.

 

If you had a data set with an odd number of observations—while ordering the data from smallest to largest—you can take the middle term of the data set, and that will be your median. That looks like this:

 

{2, 4, 6, 8, 10}

 

2 and 10 cancel out, 4 and 8 cancel out, and all you are left with is 6. Six is the median of this data set. As it happens, it is also the mean. But what if we had a data set with an even number of data points? Consider this data set:

 

{2, 4, 6, 8, 10, 12}

 

As we can see, there are 6 total numbers in this data set. Six is an even number. So, then we can cancel out 2 and 12, and we can cancel out 4 and 10, so that we are left with 6 and 8. Next we add 6 and 8 together, which equals 14. Then we divide that summation, 14, by two, which equals 7. Seven is our median for this data set.

 

It is important to note that when you are adding and dividing the two most middle data points (recall, the 6 and the 8), you are effectively averaging them together. Refer to the definition of mean, previously mentioned.

 

I would like to note that mean and median are NOT the same measurement and should NOT be used interchangeably. Hopefully, you can see that these two measurements are completely different, albeit, similar.

 

Before I continue with these statistical terms, I would like to calm your concerns, if you have any. If you take a typical statistics course, the data sets that you will be making calculations on will not exceed 20 data points. You will NOT be expected to order the data, unless the data set is smaller. You will be allowed to use a calculator, so you can make these calculations quickly. If you are taking a statistics course within the context of Data Science, however, you will be using computer software to assist you with the heavy lifting. Statistical software like R and SAS will typically be used, even Python occasionally, and depending on the class curriculum. So, fear not!

 

Let’s continue.

 

MODE

 

We can now discuss MODE. This term is relatively simply to understand. The mode is basically the value that appears the most frequently in the data set. Here is an easy example:

 

{1, 2, 3, 3, 4, 5, 6, 7, 7, 7, 8, 9, 10}

 

In the above example, the mode would be 7, because it the value that occurs the most frequently in the data set. Now, you may be wondering if it was possible for a data set to have many modes, and the answer is, ABSOLUTELY!

 

{1, 2, 3, 3, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10}

 

In the above example, 3 and 8 occur the most frequently. So, in this case we have TWO modes in our data set. This is known as a BIMODAL data set, or a set with two modes. We can also have TRIMODAL data sets, and MULTIMODAL for any data set with more than three modes.

 

RANGE

 

RANGE is next on our list, and this descriptive statistic has many more components that the previous terms. Let’s go over them now.

 

The range is the difference between the smallest value (known as the “minimum value”) and the largest value (known as the “maximum value”) in the data set.

 

The process for finding the range is straightforward. First, you arrange the numbers in your data set from smallest to largest. Then you take the largest number and subtract it by the smallest number. Let’s see what this looks like in the following data set:

 

{2, 4, 7, 15, 19, 23}

 

Using this process noted above, can you determine the range of this data set? If you found the answer to be 21, then you would be correct. Here is another look at the formula:

 

Largest number – Smallest number = Range

 

23 – 2 = 21

 

STANDARD DEVIATION

 

Let’s discuss the STANDARD DEVIATION quite basically, as the calculation for this descriptive statistic will be computed by programming and scripting languages like R, SAS, or Python. Let me reiterate, you will NOT be expected to calculate the standard deviation (or variance) without the use of software, UNLESS you are taking a traditional statistics course in school.

 

The standard deviation can be defined as the unit of dispersion (or “spread”) from the mean of a data set. The standard deviation—also known as σ, or “sigma”—can be calculated by taking the square root of the variance.

 

These two terms—standard deviation and variance—might seem abstract right now, but when we start creating prediction equations with Regression techniques, they will become clearer. Just know that these terms are tremendously important within the scope of statistics and data analytics. Keep them in the back of your mind for now.

 

Let’s continue.

 

QUARTILES AND PERCENTILES

 

QUARTILES divided the data set into four equal parts, much like the median splits the data in half. Each quartile, in effect, is roughly 25% of the data in four equal parts. There are 3 main quartiles of interest. This might seem confusing and counter-intuitive but let me explain.


Quartile Range


0 to 25% of the data exists within the lower quartile, or Q1. 25% to 50% of the data exists within the middle quartile, or Q2. 50% to 75% of the data exists within the upper quartile, which is Q3. It is important to conclude that the distance between the lower and upper quartile is known as the INTER QUARTILE RANGE, or IQR.

 

This image was sourced from mathisfun.com

 

PERCENTILES

 

Unlike quartiles, PERCENTILES divide the data into 100 equal parts, and these parts are represented as percentages.

 

THANK YOU!

 

This concludes my post on basic statistical terms that you should be familiar with before beginning your transition from the Sex Work industry into data analytics and analysis. I hope this content has been useful to you. And I know. 


It seems that progress is slow, and to be fair, it is! But often, things that are the most valuable are ALWAYS a struggle to acquire. If true mastery was easy to obtain, we would ALL be masters, and NOTHING would be valued in this world.

 

You are all so courageous for undertaking this journey. I admire you all so much for wanting, and striving, to improve your lives. I am so grateful that you have included me in your transition. I will give you my best, always! 

Share: