~~TOC 0-1~~
|<100% 25% - >|
^ \\ DATA ANALYTICS REFERENCE DOCUMENT\\ \\ ^^
^ Document Title:|52553 - Applied Database|
^ Document No.:|1548347488|
^ Author(s):|Rita Raher, Gerhard van der Linde|
^ Contributor(s):| |
**REVISION HISTORY**
|< 100% 10% - - 10% 17% 10% >|
^ \\ Revision\\ \\ ^\\ Details of Modification(s)^\\ Reason for modification^ \\ Date ^ \\ By ^
| [[:doku.php?id=modules:52553&do=revisions|0]] |Draft release|52553 - Applied Database reference page| 2019/01/24 16:31 | Gerhard van der Linde |
----
====== 52553 - Applied Databases ======
===== Module Breakdown =====
* 40% continuous assessment
* 60% for a final project
I'll detail the breakdown of the 40% continuous assessment over the coming weeks in the module.
====== Week 1 - Introduction ======
==== What is data ====
* Datum
* Single piece of information fact or statistic.
* Data
* A series of facts or statistics.
* Types of Data
* Non digital information.
* Digital Information
* Active Digital Footprint
* Passive Digital Footprint
==== Ever increasing data… per minute ====
* 120+ new professionals join LinkedIn
* 456,000 tweets sent
* 3.6 million Google searches
* 4.1 million YouTube videos watched
* 18 million weather forecast requests received
===== Databases =====
* Relational Databases\\ {{:modules:52553:db_relational.png?nolink|}}
* Non-Relational (NoSQL) Databases\\ {{:modules:52553:db_non-relational.png?nolink|}}
==== Relational Databases ====
* A relational database consists of a set of tables used for storing data.
* A table is collection of related data
* Each table has a unique name and may relate to one or more other tables in the database through common values.
* A table in a database is a collection of rows and columns. Tables are also known as entities or relations.
* A row contains data pertaining to a single item or record in a table. Rows are also known as records or tuples.
* A column contains data representing a specific characteristic of the records in the table. Columns are also known as fields or attributes.
==== Spreadsheets ====
{{:modules:52553:spreadsheets_.png?nolink |}}
{{:modules:52553:spreadsheets_1.png?nolink |}}
==== Database Schema ====
* A database consists of schemas, tables, views and other objects.
* A database schema represents the logical configuration of all or part of a database.
* It defines how the data, and relationships between the data, is stored
* Two types of Schema:
* Physical Schema - Defines out how data is stored physically on a storage system in terms of files and indices.
* Logical Schema - Defines the logical constraints that apply to the stored data, the tables in the database and the relationships between them.
==== Logical Schema ====
* The Logical Schema is designed before the database is created.
* No data is contained in the logical schema.
{{:modules:52553:logical_schema_1.png?nolink|}}
{{:modules:52553:logical_schema_2.png?nolink|}}
==== Spreadsheets vs Databases ====
{{:modules:52553:spreadsheets_vs_databases.png?nolink|}}
===== Database Management System (DBMS) =====
* A Database Management System (DBMS) is software for creating and managing databases.
* The DBMS interacts with the user, the database itself, and other systems in order to store, retrieve and process data.
* The DBMS provides a centralized view of data that can be accessed by multiple users, from multiple locations, in a controlled manner.
* The DBMS can limit what data the end user sees, as well as how that end user can view the data, providing many views of a single database schema.
* The DBMS provides data independence, freeing users (and application programs) from knowing where or how the data is stored. Any changes in how or where the data is stored is completely transparent due to the DBMS.
* CRUD (Create, Read, Update, Delete) functions
{{:modules:52553:dbms_management_systems_1.png?nolink|}}
===== DBMS Functions =====
* Data Storage Management
* Security
* Backup and Recovery
* Transaction Management
* Debit Customer a/c
* Update Shipping Table
* Update Products Table
* Credit Store a/c
* Data integrity
* Concurrency
{{:modules:52553:data_integrity.png?nolink|}}
===== Advantages of DBMSs =====
* Controlling Redundancy
* Data Integrity
* Enforcement of Standards
* Backup and Recovery
* Security
{{:modules:52553:controlling_redundancy.png?nolink|}}
===== Disadvantages of DBMSs =====
* Complexity
* Size
* Performance
* Higher impact of failure
====== Week 2 - Getting Info from Databases ======
{{ :modules:52553:pdf:topic_2_-_lecture.pdf |}}
===== SQL =====
* Structured Query Language
* Standard Relational Database Language
* SQL is an ANSI/ISO standard, but different databases e.g. MySQL, SQL Server, Oracle may use their own proprietary extensions on top of the standard SQL.
===== What can SQL do? =====
**CRUD**
* Create a new database
* Create tables in a database
* Insert data into a database
* Read data from a database
* Update data in a database
* Delete data from a database
* Manage transactions
* Manage concurrency
* Backup and recovery
* Manage users
===== SQL vs MySQL =====
* SQL is a language.
* MySQL is a database management system.
===== Creating a database =====
mysql> create database myFirstDatabase;
Query OK, 1 row affected (0.01 sec)
mysql> CREATE dataBase MYFirstDATABASE;
Query OK, 1 row affected (0.01 sec)
mysql> create
-> database
-> MyFirstDatabase
-> ;
Query OK, 1 row affected (0.01 sec)
===== Using a Database =====
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| myfirstdatabase |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> use myfirstdatabase;
Database changed
===== Creating Tables =====
[[https://dev.mysql.com/doc/refman/8.0/en/data-types.html|MySQL Data Types]]
==== Car Attributes ====
* Make - Varchar(20)
* Model - Varchar(20)
* Registration - Varchar(20)
* Colour - Varchar(20)
* Mileage - Integer
* Engine Size - float(2,1)
* Cylinders
* Crankshaft
mysql> create table car (
-> make VARCHAR(20),
-> model VARCHAR(20),
-> registration VARCHAR(15),
-> colour VARCHAR(10),
-> mileage INTEGER,
-> enginSize FLOAT(2,1));
Query OK, 0 rows affected (0.09 sec)
==== Describing Tables ====
mysql> describe car;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| make | varchar(20) | YES | | NULL | |
| model | varchar(20) | YES | | NULL | |
| registration | varchar(15) | YES | | NULL | |
| colour | varchar(10) | YES | | NULL | |
| mileage | int(11) | YES | | NULL | |
| enginSize | float(2,1) | YES | | NULL | |
+--------------+-------------+------+-----+---------+-------+
6 rows in set (0.02 sec)
=== Primary key ===
Primary key - unique values. 1 primary key pair table
add personID - int auto_increment flag to increase
Using reg as primary key
reg VARCHAR(15)
PRIMARY KEY(reg)
=== SELECT ===
SELECT name FROM person;
SELECT * FROM person;
=== WHERE ===
SELECT name FROM person where NOT isStudent
SELECT name
FROM person
WHERE isStudent
AND sex = “M”;
=== where >=, <=, Between ===
SELECT personID, NAME, AGE
FROM person
WHERE age >= 20
AND age <= 39;
SELECT personID, NAME, AGE
FROM person
WHERE age between 20 and 39;
=== Like ===
SELECT personID, NAME, AGE
FROM person
WHERE name LIKE “%a%”;
SELECT personID, NAME, AGE
FROM person
WHERE name LIKE “_a%”;
=== IN ===
SELECT personID, NAME, AGE
FROM person
WHERE age = 12
OR age = 13
OR age = 14
OR age = 15;
=== AND OR ===
SELECT personID, NAME, AGE
FROM person
WHERE age IN
(12, 13, 14, 15);
SELECT NAME, AGE
FROM person
WHERE sex =“M”
AND name LIKE “S%”
OR name LIKE “A%"
=== Limit ===
SELECT NAME, AGE
FROM person
WHERE sex =“M”
AND name LIKE “S%”
OR name LIKE “A%"
LIMIT 1;
SELECT NAME, AGE
FROM person
WHERE sex =“M”
AND name LIKE “S%”
OR name LIKE “A%"
LIMIT 0,3;
=== DISTINCT ===
SELECT DISTINCT(name)
=== order by name DESC ===
YEAR()
DAY()
SELECT name, age, MONTHNAME(dob)
FROM person
WHERE day(DOB) between 1 and 11
AND name NOT like “A%"
order by name DESC
=== Import a database ===
use
====== Week3 - Applied Databases MySQL Functions and Procedures ======
===== Functions =====
* Mysql can do more than store and retrieve data
* It can also manipulate the data before storing or retrieving it, via functions.
* A function is a piece of code that performs some operation and returns a result.
* Some functions accept parameters, others do not
===== Built-in functions =====
* String functions https://dev.mysql.com/doc/refman/8.0/en/string-functions.html
* Numeric functions https://dev.mysql.com/doc/refman/8.0/en/numeric-functions.html
* Date and time functions https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
* Aggregate functions https://dev.mysql.com/doc/refman/8.0/en/group-by-functions-and-modifiers.html
* MySql Information Functions https://dev.mysql.com/doc/refman/8.0/en/information-functions.html
* MySQL Control Flow Functions https://dev.mysql.com/doc/refman/8.0/en/control-flow-functions.html
===== String Functions =====
* **Upper()**
* Returns an uppercase version of a string
* **STRCMP()**: Compares two strings and returns:
* 0 if string 1 - string 2
* -1 if string 1 < string 2
* 1 if string 1 > string 2
* **ASCII()**
* Returns the ASCII value of the first character in a string
* **REPLACE(string, from_string, to_string)**
* Replaces all occurrences of s substring within a string, with a new substring
* string - The original string
* from_string - The substring to be replaced
* to_string - The new replacement string
REPLACE(name, “Ms”, “MRS”)
* **SUBSTR(string, start, length)**
* extract a substring from a string
* string - the start to extract from
* start - The start position within the string
* length - The number of character to be extracted
SUBSTR(name, 1, 3)**
* **SQRT(Number)**
* Returns the square root of a number
* **Round(number, decimals)**
* Rounds a number to a specified number of decimal places
ROUND(enginesize)
* **DATEDIFF(date1, date2)**
* returns the number of days between the 2 dates
* **DATE_FORMAT(date, format)**
* formats a date
DATE_FORMAT(dob, "%d-%m-%y”)
===== Aggregate Functions =====
An aggregate function performs a calculation on a set of values and returns a single value.
* **AVG()**
* **MIN()**
* **MAX()**
* **SUM()**
* **COUNT()**
**Group By**
The groupie statement is often used with aggregate functions(COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns
SELECT level, AVG(experience)
FROM teacher GROUP BY level;
SELECT ROUND(AVG(mileage)) as “KMs”
FROM car;
SELECT model, ROUND(AVG(mileage)) as “KMs”
FROM car
GROUP BY model;
SELECT model, ROUND(AVG(mileage)) as “KMs”
FROM car
WHERE mileage > 60000
GROUP BY model;
==== HAVING ====
* The HAVING clause is often used with the GROUP BY clause to filter groups based on a specified condition.
* if the GROUP BY clause is omitted, the having clause behaves like the WHERE clause.
* The HAVING clause applies a filter condition to each group of rows
* The WHERE clause applies the filter condition to each initial row.
SELECT model, ROUND(AVG(mileage)) as “KMs”
FROM car
WHERE mileage > 60000
GROUP BY model
HAVING KMs > 250000;
==== Information Functions ====
**DATABASE()\\
USER()**
select DATABASE();
select USER();
==== Control Flow Functions ====
* IF(condition, value_if_true, value_if_false)
* condition - value to test
* value_if_true - Value to return if condition is True
* value_if_false - Value to return if condition is False
SELECT IF(150>200, “yes”, “no”)”T/F”;
SELECT *,IF(experience >= 20 AND experience <= 45, “Y”, “”) as “Payrise Due”
FROM teacher
CASE WHEN condition 1 THEN result 1
WHEN condition 2 THEN result 1
WHEN condition n THEN result n
ELSE result
END
SELECT name, dob,
CASE
WHEN month(dob) in (2,3,4) THEN “Spring”
WHEN month(dob) in (5,6,7) THEN “Summer”
WHEN month(dob) in (8,9,10) THEN “Autumn”
WHEN month(dob) in (11,12,1) THEN “Winter”
END as Season
from person;
SELECT name, dob,
CASE
WHEN month(dob) in (2,3,4) THEN “Spring”
WHEN month(dob) in (5,6,7) THEN “Summer”
ELSE “"
END as Season
from person;
==== Stored Routines ====
A stored routine is user-written code that extends the functionality Mysql.
**Uses**
* When multiple client apps are written in different languages or work on different platforms, but need to perform the same database operations.
* To ensure security. Applications cannot directly access tables only stored routines.
**Advantages**
* Speed
* Performance of applications accessing the database is increased.
* This is because stored procedures are complied and stored in the database
* Traffic
* Instead of sending multiple lengthly SQL statements, the application has to send only the name and parameters of the stored routine.
**Disadvantages**
* Complexity
* Not designed for complex business logic
* Difficult to debug. Only a few database management systems allow to debug stored procedure. MySQL is not one of them.
* Performance. A DBMS is not well-designed for logical operations.
==== MySQL Stored Functions ====
* A stored fountain is a special kind stored routine that returns a single value.
* Stored functions are used to encapsulate common formulas or business logic rules that are reusable among Sql statement or stored routine.
* A functions take 0 or more input parameters and return a single value
https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
CREATE FUNCTION add2Nums(num1 integer, num2 integer)
RETURNS integer
DETERMINISTIC
BEGINS
RETURN num1 + num2;
END
SELECT add2Nums(3, 10);
CREATE FUNCTION discount(age INT(11))
RETURNS VARCHAR(3)
DETERMINISTIC
BEGIN
IF age < 16 THEN
RETURN “0%”;
ELSEIF age < 26 THEN
RETURN “10%”;
ELSEIF age < 40 THEN
RETURN “20%”;
ELSEIF age < 60 THEN
RETURN “30%”;
ELSE
RETURN “40%”;
END IF;
END
SELECT name, age, discount(age) “Discount”
FROM person
==== Stored Procedures ====
^FUNCTIONS^PROCEDURES^
|Return a single value|Retun 0 or more values|
|Only select|select, insert, update, delete|
|Cant use stored procedures|can use stored functions|
|Does not support transactions|Support transactions|
[[https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html]]
Create procedure
CREATE PROCEDURE make_mileage(mk VARCHAR(20), ml INT(11))
DETERMINISTIC
BEGIN
SELECT * FROM CAR
WHERE make LIKE mk
AND mileage < ml
ORDER BY mileage;
END
**Call Procedure**
call make_mileage(“Toyota”, 200000)
call make_mileage(“Ford”, 5000)
==== MySQL Routine Management ====
**Finding Functions and Procedures**
SELECT name, type from MYSQL.PROCE limit 3;
**What’s in a function Procedure**
SHOW CREATE FUNCTION add2nums;
**Drop a Function or Procedure**
DROP FUNCTION add2nums
{{ :modules:52553:pdf:topic_3_exercises.pdf |Week 3 Excercises}}
====== Week 4 - Normalisation ======
{{ :modules:52553:pdf:topic_4_-_normalisation.pdf |}}\\
Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.
===== show create table =====
show create table manufacturer;
+--------------+-------------------------------------+
| Table | Create Table |
+--------------+-------------------------------------+
| manufacturer | CREATE TABLE `manufacturer` (
`manu_code` varchar(3) NOT NULL,
`manu_name` varchar(200) NOT NULL,
`manu_details` varchar(400) DEFAULT NULL,
PRIMARY KEY (`manu_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+--------------+-------------------------------------+
1 row in set (0.00 sec)
===== Get info from multiple tables =====
* INNER JOIN
* LEFT JOIN
^INNER JOIN^LEFT JOIN^
|Return rows from two tables only when the JOIN condition is met.|Retrun rows from two tables when the JOIN condition is met.|
|If JOIN condition is **not** met, nothing is returned from either table.|If JOIN condition is **not** met, rows from the first (LEFT) table are returned and NULL is returned instead of rows from the second table.|
===== Functions needed for exercises =====
select substring()
{{:modules:mysql_substring.png?nolink|}}
{{ :modules:52553:pdf:topic_4_exercises.pdf |}}\\
====== Topic 5 - Insert Update Delete ======
{{:modules:52553:pdf:topic_5_-_insert_update_delete.pdf |}}
===== Insert =====
* CRUD (Create/Insert, Read, Update Delete)\\ https://dev.mysql.com/doc/refman/8.0/en/insert.html
* INSERT INTO