Wednesday, August 10, 2022

Database Management Post 2: SQL, From The Beginning

Where better to start than from the beginning? I would like to use this post to achieve the following objective:

 

1.         Create a basic database by using DDL and DML statements.

 

 

SQL is divided into two main families of statements, DDL and DML.

 

 

DDL, or Data Definition Language, encompasses the statements CREATE, DROP, ALTER, RENAME. And DML, or Data Manipulation Language contains the statements SELECT, INSERT, UPDATE, DELATE. 


These statements have different functionality within the entire framework of SQL and will be identified more specifically later.

 

Let’s start!

 

We will begin by loading the “sqldf” package into our environment and turn off any warnings that we might receive.

 

options(warn=-1)
library(sqldf)

## Loading required package: gsubfn

## Loading required package: proto

## Loading required package: RSQLite

 

Great! Now we can create our first relational database. Let’s create a database that tracks income over a period of time. (We won’t be focused on specificity or complexity at first, however). We will call our database, INCOME. Here is how we will create it:

 

db <- dbConnect(SQLite(), dbname="INCOMEdb.sqlite")

 

Notice that we have created an object called “db,” short for database, and that we have given it the title of “INCOME” as expressed in the dbname argument.

 

In that database, we will create our first table called MEMBERS. In that table, we will add four columns. This table is meant to be basic, at first, as we can add columns later in our journey to have more specific information regarding the observations within the table. 


For now, this is what the code looks like if we add the four columns, MNo (as in member number), Name, Earnings, Gender (Male or Female).

 

sqldf("CREATE TABLE MEMBERS
            (MNo INTEGER PRIMARY KEY,
            Name TEXT NOT NULL,
            Earnings INTEGER NOT NULL,
            Gender TEXT NOT NULL
            )",dbname="INCOMEdb.sqlite")

## data frame with 0 columns and 0 rows

 

Before I display the table in our environment, I would like to describe the attributes and constraints that I have given each column. This is important, as it helps the algorithm determine what types of information is related to that column.

 

Firstly, I used the CREATE TABLE statement, and created a table called MEMBERS. Then I gave the column called “MNo” the attribute of an INTEGER (which is a whole number), and it is a Primary Key, as in it is unique attribute related this table.

 

Next, I gave the “Name” column the attribute of TEXT. Meaning that the information stored in this column will have text, and not a number. I also constrained this column to be NOT NULL. And this means that the values in this column are known and NOT MISSING.

 

The “Earnings” column will also be an integer, as incomes are expressed in numbers. It will have the constraint of NOT NULL. This information will be recorded for every person in that is listed in the MEMBERS table, and as a result, there will be no missing information pertaining to “Earnings.”


Finally, Gender will have the attribute of TEXT (to be recorded as Male or Female), and this information will always be recorded in the MEMBERS table, so we will give the constraint of NOT NULL.

 

This table is very basic at the moment. Later on, when creating a database that is more informative, we might add additional columns like “Job Function” or ”Role.”

 

Let’s see what the table looks like in our environment, so far.

 

sqldf("pragma table_info(MEMBERS)",dbname="INCOMEdb.sqlite")

##   cid     name    type notnull dflt_value pk
## 1   0      MNo INTEGER       0         NA  1
## 2   1     Name    TEXT       1         NA  0
## 3   2 Earnings INTEGER       1         NA  0
## 4   3   Gender    TEXT       1         NA  0

 

This output only tells us the attributes and constraints that we have given the columns of the table. We can’t see what the actual table looks like yet as we have only created the theoretical framework for it. Let’s begin the process of adding values to the table.

 

sqldf(c("INSERT INTO MEMBERS VALUES (1, 'Sally', 900, 'Female')",
      "INSERT INTO MEMBERS VALUES (2, 'Fred', 1200, 'Male')",
      "INSERT INTO MEMBERS VALUES (3, 'Mary', 2000, 'Female')"),
      dbname="INCOMEdb.sqlite")

## data frame with 0 columns and 0 rows

sqldf("SELECT * FROM MEMBERS",
      dbname="INCOMEdb.sqlite")

##   MNo  Name Earnings Gender
## 1   1 Sally      900 Female
## 2   2  Fred     1200   Male
## 3   3  Mary     2000 Female

 

Here, I have used the INSERT INTO statement into the MEMBERS table. I have added three rows for demonstrative purposes. Let’s take the first row: It will have a MNo of 1, Sally will be listed under Name, 900 is Sally’s Earnings to date, and Sally’s Gender is Female. The same logic can be applied to the second and third rows in the code.

 

Then we can see the entire MEMBERS table with the SELECT statement, and as you can see, the values programmed in the previous chuck of code are listed in the table.

 

One last point, I would like to touch on is deleting a table, or the DROP statement. This statement drops, or deletes, the entire table from the database. Are here is what that looks like:

 

 

sqldf("DROP TABLE MEMBERS", dbname = "INCOMEdb.sqlite")

## data frame with 0 columns and 0 rows

 

This will conclude my post on generating a basic database in R Studio. In the following posts we will get a deeper dive into SQL, and see what it might look like with more complicated functions and statements. Thank you for your support.



THANK YOU!


Another slight change of pace for this post. SQL is not typically used in R, but with the appropriate package in R Studio, we can program SQL functions in the environment and learn SQL this way. 


I wanted to post this so that you have a basic understanding of SQL, and I will be posting more SQL as it is a skill that is useful in data analytics. 


Thank you so much for visiting my blog! Enjoy your journey into a more "data-fied" world!
Share:
Location: Chicago, IL, USA