Where better to start than from the beginning? I would like to use this post to achieve the following objective:
1.
Create a basic database by using DDL and DML
statements.
SQL is divided into two main families of statements, DDL and
DML.
DDL, or Data Definition Language, encompasses the statements CREATE, DROP, ALTER, RENAME. And DML, or Data Manipulation Language contains the statements SELECT, INSERT, UPDATE, DELATE.
These statements have different functionality
within the entire framework of SQL and will be identified more specifically
later.
Let’s start!
We will begin by loading the “sqldf” package into our
environment and turn off any warnings that we might receive.
options(warn=-1)
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
Great! Now we can create our first relational database.
Let’s create a database that tracks income over a period of time. (We won’t be
focused on specificity or complexity at first, however). We will call our
database, INCOME. Here is how we will create it:
db <- dbConnect(SQLite(), dbname="INCOMEdb.sqlite")
Notice that we have created an object called “db,”
short for database, and that we have given it the title of “INCOME” as
expressed in the dbname argument.
In that database, we will create our first table called MEMBERS. In that table, we will add four columns. This table is meant to be basic, at first, as we can add columns later in our journey to have more specific information regarding the observations within the table.
For now, this
is what the code looks like if we add the four columns, MNo (as in member
number), Name, Earnings, Gender (Male or Female).
sqldf("CREATE TABLE MEMBERS
(MNo INTEGER PRIMARY KEY,
Name TEXT NOT NULL,
Earnings INTEGER NOT NULL,
Gender TEXT NOT NULL
)",dbname="INCOMEdb.sqlite")
## data frame with 0 columns and 0 rows
Before I display the table in our environment, I would
like to describe the attributes and constraints that I have given each column.
This is important, as it helps the algorithm determine what types of
information is related to that column.
Firstly, I used the CREATE TABLE statement, and created a
table called MEMBERS. Then I gave the column called “MNo” the attribute of an
INTEGER (which is a whole number), and it is a Primary Key, as in it is unique
attribute related this table.
Next, I gave the “Name” column the attribute of TEXT.
Meaning that the information stored in this column will have text, and not a
number. I also constrained this column to be NOT NULL. And this means that the
values in this column are known and NOT MISSING.
The “Earnings” column will also be an integer, as incomes
are expressed in numbers. It will have the constraint of NOT NULL. This
information will be recorded for every person in that is listed in the MEMBERS
table, and as a result, there will be no missing information pertaining to
“Earnings.”
Finally, Gender will have the attribute of TEXT (to be
recorded as Male or Female), and this information will always be recorded in
the MEMBERS table, so we will give the constraint of NOT NULL.
This table is very basic at the moment. Later on, when
creating a database that is more informative, we might add additional columns
like “Job Function” or ”Role.”
Let’s see what the table looks like in our environment, so
far.
sqldf("pragma table_info(MEMBERS)",dbname="INCOMEdb.sqlite")
## cid name
type notnull dflt_value pk
## 1 0 MNo INTEGER 0
NA 1
## 2 1 Name
TEXT 1 NA
0
## 3 2
Earnings INTEGER 1 NA
0
## 4 3 Gender
TEXT 1 NA
0
This output only tells us the attributes and
constraints that we have given the columns of the table. We can’t see what the actual
table looks like yet as we have only created the theoretical framework for it.
Let’s begin the process of adding values to the table.
sqldf(c("INSERT
INTO MEMBERS VALUES (1, 'Sally', 900, 'Female')",
"INSERT INTO MEMBERS VALUES (2, 'Fred', 1200,
'Male')",
"INSERT INTO MEMBERS VALUES (3, 'Mary', 2000,
'Female')"),
dbname="INCOMEdb.sqlite")
## data frame with 0 columns and 0 rows
sqldf("SELECT * FROM MEMBERS",
dbname="INCOMEdb.sqlite")
## MNo Name Earnings Gender
## 1 1
Sally 900 Female
## 2 2 Fred
1200 Male
## 3 3 Mary
2000 Female
Here, I have used the INSERT INTO statement into the
MEMBERS table. I have added three rows for demonstrative purposes. Let’s take
the first row: It will have a MNo of 1, Sally will be listed under Name, 900 is
Sally’s Earnings to date, and Sally’s Gender is Female. The same logic can be
applied to the second and third rows in the code.
Then we can see the entire MEMBERS table with the SELECT
statement, and as you can see, the values programmed in the previous chuck of
code are listed in the table.
One last point, I would like to touch on is deleting a
table, or the DROP statement. This statement drops, or deletes, the entire
table from the database. Are here is what that looks like:
sqldf("DROP TABLE MEMBERS", dbname = "INCOMEdb.sqlite")
## data frame with 0 columns and 0 rows