~~TOC 0-1~~ |<100% 25% - >| ^ \\ DATA ANALYTICS REFERENCE DOCUMENT\\ \\ ^^ ^ Document Title:|52553 - Applied Database| ^ Document No.:|1548347488| ^ Author(s):|Rita Raher, Gerhard van der Linde| ^ Contributor(s):| | **REVISION HISTORY** |< 100% 10% - - 10% 17% 10% >| ^ \\ Revision\\ \\ ^\\ Details of Modification(s)^\\ Reason for modification^ \\ Date ^ \\ By ^ | [[:doku.php?id=modules:52553&do=revisions|0]] |Draft release|52553 - Applied Database reference page| 2019/01/24 16:31 | Gerhard van der Linde | ---- ====== 52553 - Applied Databases ====== ===== Module Breakdown ===== * 40% continuous assessment * 60% for a final project I'll detail the breakdown of the 40% continuous assessment over the coming weeks in the module. ====== Week 1 - Introduction ====== ==== What is data ==== * Datum * Single piece of information fact or statistic.  * Data * A series of facts or statistics. * Types of Data * Non digital information. * Digital Information * Active Digital Footprint * Passive Digital Footprint ==== Ever increasing data… per minute ==== * 120+ new professionals join LinkedIn * 456,000 tweets sent * 3.6 million Google searches * 4.1 million YouTube videos watched * 18 million weather forecast requests received ===== Databases ===== * Relational Databases\\ {{:modules:52553:db_relational.png?nolink|}} * Non-Relational (NoSQL) Databases\\ {{:modules:52553:db_non-relational.png?nolink|}} ==== Relational Databases ==== * A relational database consists of a set of tables used for storing data. * A table is collection of related data * Each table has a unique name and may relate to one or more other tables in the database through common values. * A table in a database is a collection of rows and columns. Tables are also known as entities or relations. * A row contains data pertaining to a single item or record in a table. Rows are also known as records or tuples. * A column contains data representing a specific characteristic of the records in the table. Columns are also known as fields or attributes. ==== Spreadsheets ==== {{:modules:52553:spreadsheets_.png?nolink |}} {{:modules:52553:spreadsheets_1.png?nolink |}} ==== Database Schema ==== * A database consists of schemas, tables, views and other objects. * A database schema represents the logical configuration of all or part of a database. * It defines how the data, and relationships between the data, is stored * Two types of Schema: * Physical Schema - Defines out how data is stored physically on a storage system in terms of files and indices. * Logical Schema - Defines the logical constraints that apply to the stored data, the tables in the database and the relationships between them. ==== Logical Schema ==== * The Logical Schema is designed before the database is created. * No data is contained in the logical schema. {{:modules:52553:logical_schema_1.png?nolink|}} {{:modules:52553:logical_schema_2.png?nolink|}} ==== Spreadsheets vs Databases ==== {{:modules:52553:spreadsheets_vs_databases.png?nolink|}} ===== Database Management System (DBMS) ===== * A Database Management System (DBMS) is software for creating and managing databases. * The DBMS interacts with the user, the database itself, and other systems in order to store, retrieve and process data. * The DBMS provides a centralized view of data that can be accessed by multiple users, from multiple locations, in a controlled manner. * The DBMS can limit what data the end user sees, as well as how that end user can view the data, providing many views of a single database schema. * The DBMS provides data independence, freeing users (and application programs) from knowing where or how the data is stored. Any changes in how or where the data is stored is completely transparent due to the DBMS. * CRUD (Create, Read, Update, Delete) functions {{:modules:52553:dbms_management_systems_1.png?nolink|}} ===== DBMS Functions ===== * Data Storage Management * Security * Backup and Recovery * Transaction Management * Debit Customer a/c  * Update Shipping Table  * Update Products Table  * Credit Store a/c * Data integrity * Concurrency {{:modules:52553:data_integrity.png?nolink|}} ===== Advantages of DBMSs ===== * Controlling Redundancy * Data Integrity  * Enforcement of Standards  * Backup and Recovery  * Security {{:modules:52553:controlling_redundancy.png?nolink|}} ===== Disadvantages of DBMSs ===== * Complexity  * Size  * Performance  * Higher impact of failure ====== Week 2 - Getting Info from Databases ====== {{ :modules:52553:pdf:topic_2_-_lecture.pdf |}} ===== SQL ===== * Structured Query Language * Standard Relational Database Language * SQL is an ANSI/ISO standard, but different databases e.g. MySQL, SQL Server, Oracle may use their own proprietary extensions on top of the standard SQL. ===== What can SQL do? ===== **CRUD** * Create a new database  * Create tables in a database  * Insert data into a database  * Read data from a database  * Update data in a database  * Delete data from a database  * Manage transactions  * Manage concurrency  * Backup and recovery  * Manage users ===== SQL vs MySQL ===== * SQL is a language. * MySQL is a database management system. ===== Creating a database ===== mysql> create database myFirstDatabase; Query OK, 1 row affected (0.01 sec) mysql> CREATE dataBase MYFirstDATABASE; Query OK, 1 row affected (0.01 sec) mysql> create -> database -> MyFirstDatabase -> ; Query OK, 1 row affected (0.01 sec) ===== Using a Database ===== mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | myfirstdatabase | | mysql | | performance_schema | | sys | +--------------------+ 5 rows in set (0.00 sec) mysql> use myfirstdatabase; Database changed ===== Creating Tables ===== [[https://dev.mysql.com/doc/refman/8.0/en/data-types.html|MySQL Data Types]] ==== Car Attributes ==== * Make - Varchar(20) * Model - Varchar(20) * Registration - Varchar(20) * Colour - Varchar(20) * Mileage - Integer * Engine Size - float(2,1) * Cylinders * Crankshaft mysql> create table car ( -> make VARCHAR(20), -> model VARCHAR(20), -> registration VARCHAR(15), -> colour VARCHAR(10), -> mileage INTEGER, -> enginSize FLOAT(2,1)); Query OK, 0 rows affected (0.09 sec) ==== Describing Tables ==== mysql> describe car; +--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | make | varchar(20) | YES | | NULL | | | model | varchar(20) | YES | | NULL | | | registration | varchar(15) | YES | | NULL | | | colour | varchar(10) | YES | | NULL | | | mileage | int(11) | YES | | NULL | | | enginSize | float(2,1) | YES | | NULL | | +--------------+-------------+------+-----+---------+-------+ 6 rows in set (0.02 sec) === Primary key === Primary key - unique values. 1 primary key pair table add personID - int auto_increment flag to increase Using reg as primary key reg VARCHAR(15) PRIMARY KEY(reg) === SELECT === SELECT name FROM person; SELECT * FROM person; === WHERE === SELECT name FROM person where NOT isStudent SELECT name FROM person WHERE isStudent AND sex = “M”; === where >=, <=, Between === SELECT personID, NAME, AGE FROM person WHERE age >= 20 AND age <= 39; SELECT personID, NAME, AGE FROM person WHERE age between 20 and 39; === Like === SELECT personID, NAME, AGE FROM person WHERE name LIKE “%a%”; SELECT personID, NAME, AGE FROM person WHERE name LIKE “_a%”; === IN === SELECT personID, NAME, AGE FROM person WHERE age = 12 OR age = 13 OR age = 14 OR age = 15; === AND OR === SELECT personID, NAME, AGE FROM person WHERE age IN (12, 13, 14, 15); SELECT NAME, AGE FROM person WHERE sex =“M” AND name LIKE “S%” OR name LIKE “A%" === Limit === SELECT NAME, AGE FROM person WHERE sex =“M” AND name LIKE “S%” OR name LIKE “A%" LIMIT 1; SELECT NAME, AGE FROM person WHERE sex =“M” AND name LIKE “S%” OR name LIKE “A%" LIMIT 0,3; === DISTINCT === SELECT DISTINCT(name) === order by name DESC === YEAR() DAY() SELECT name, age, MONTHNAME(dob) FROM person WHERE day(DOB) between 1 and 11 AND name NOT like “A%" order by name DESC === Import a database === use show tables; drop database ; describe ; select * from cars; ====== Week3 - Applied Databases MySQL Functions and Procedures ====== ===== Functions ===== * Mysql can do more than store and retrieve data * It can also manipulate the data before storing or retrieving it, via functions. * A function is a piece of code that performs some operation and returns a result. * Some functions accept parameters, others do not ===== Built-in functions ===== * String functions https://dev.mysql.com/doc/refman/8.0/en/string-functions.html * Numeric functions https://dev.mysql.com/doc/refman/8.0/en/numeric-functions.html * Date and time functions https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html * Aggregate functions https://dev.mysql.com/doc/refman/8.0/en/group-by-functions-and-modifiers.html * MySql Information Functions https://dev.mysql.com/doc/refman/8.0/en/information-functions.html * MySQL Control Flow Functions https://dev.mysql.com/doc/refman/8.0/en/control-flow-functions.html ===== String Functions ===== * **Upper()** * Returns an uppercase version of a string * **STRCMP()**: Compares two strings and returns: * 0 if string 1 - string 2 * -1 if string 1 < string 2 * 1 if string 1 > string 2 * **ASCII()** * Returns the ASCII value of the first character in a string * **REPLACE(string, from_string, to_string)** * Replaces all occurrences of s substring within a string, with a new substring * string - The original string * from_string - The substring to be replaced * to_string - The new replacement string REPLACE(name, “Ms”, “MRS”) * **SUBSTR(string, start, length)** * extract a substring from a string * string - the start to extract from * start - The start position within the string * length - The number of character to be extracted SUBSTR(name, 1, 3)** * **SQRT(Number)** * Returns the square root of a number * **Round(number, decimals)** * Rounds a number to a specified number of decimal places ROUND(enginesize) * **DATEDIFF(date1, date2)** * returns the number of days between the 2 dates * **DATE_FORMAT(date, format)** * formats a date DATE_FORMAT(dob, "%d-%m-%y”) ===== Aggregate Functions ===== An aggregate function performs a calculation on a set of values and returns a single value. * **AVG()** * **MIN()** * **MAX()** * **SUM()** * **COUNT()** **Group By** The groupie statement is often used with aggregate functions(COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns SELECT level, AVG(experience) FROM teacher GROUP BY level; SELECT ROUND(AVG(mileage)) as “KMs” FROM car; SELECT model, ROUND(AVG(mileage)) as “KMs” FROM car GROUP BY model; SELECT model, ROUND(AVG(mileage)) as “KMs” FROM car WHERE mileage > 60000 GROUP BY model; ==== HAVING ==== * The HAVING clause is often used with the GROUP BY clause to filter groups based on a specified condition. * if the GROUP BY clause is omitted, the having clause behaves like the WHERE clause. * The HAVING clause applies a filter condition to each group of rows * The WHERE clause applies the filter condition to each initial row. SELECT model, ROUND(AVG(mileage)) as “KMs” FROM car WHERE mileage > 60000 GROUP BY model HAVING KMs > 250000; ==== Information Functions ==== **DATABASE()\\ USER()** select DATABASE(); select USER(); ==== Control Flow Functions ==== * IF(condition, value_if_true, value_if_false) * condition - value to test * value_if_true - Value to return if condition is True * value_if_false - Value to return if condition is False SELECT IF(150>200, “yes”, “no”)”T/F”; SELECT *,IF(experience >= 20 AND experience <= 45, “Y”, “”) as “Payrise Due” FROM teacher CASE WHEN condition 1 THEN result 1 WHEN condition 2 THEN result 1 WHEN condition n THEN result n ELSE result END SELECT name, dob, CASE WHEN month(dob) in (2,3,4) THEN “Spring” WHEN month(dob) in (5,6,7) THEN “Summer” WHEN month(dob) in (8,9,10) THEN “Autumn” WHEN month(dob) in (11,12,1) THEN “Winter” END as Season from person; SELECT name, dob, CASE WHEN month(dob) in (2,3,4) THEN “Spring” WHEN month(dob) in (5,6,7) THEN “Summer” ELSE “" END as Season from person; ==== Stored Routines ==== A stored routine is user-written code that extends the functionality Mysql. **Uses** * When multiple client apps are written in different languages or work on different platforms, but need to perform the same database operations. * To ensure security. Applications cannot directly access tables only stored routines. **Advantages** * Speed * Performance of applications accessing the database is increased. * This is because stored procedures are complied and stored in the database * Traffic * Instead of sending multiple lengthly SQL statements, the application has to send only the name and parameters of the stored routine. **Disadvantages** * Complexity * Not designed for complex business logic * Difficult to debug. Only a few database management systems allow to debug stored procedure. MySQL is not one of them. * Performance. A DBMS is not well-designed for logical operations. ==== MySQL Stored Functions ==== * A stored fountain is a special kind stored routine that returns a single value. * Stored functions are used to encapsulate common formulas or business logic rules that are reusable among Sql statement or stored routine. * A functions take 0 or more input parameters and return a single value https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html CREATE FUNCTION add2Nums(num1 integer, num2 integer) RETURNS integer DETERMINISTIC BEGINS RETURN num1 + num2; END SELECT add2Nums(3, 10); CREATE FUNCTION discount(age INT(11)) RETURNS VARCHAR(3) DETERMINISTIC BEGIN IF age < 16 THEN RETURN “0%”; ELSEIF age < 26 THEN RETURN “10%”; ELSEIF age < 40 THEN RETURN “20%”; ELSEIF age < 60 THEN RETURN “30%”; ELSE RETURN “40%”; END IF; END SELECT name, age, discount(age) “Discount” FROM person ==== Stored Procedures ==== ^FUNCTIONS^PROCEDURES^ |Return a single value|Retun 0 or more values| |Only select|select, insert, update, delete| |Cant use stored procedures|can use stored functions| |Does not support transactions|Support transactions| [[https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html]] Create procedure CREATE PROCEDURE make_mileage(mk VARCHAR(20), ml INT(11)) DETERMINISTIC BEGIN SELECT * FROM CAR WHERE make LIKE mk AND mileage < ml ORDER BY mileage; END **Call Procedure** call make_mileage(“Toyota”, 200000) call make_mileage(“Ford”, 5000) ==== MySQL Routine Management ==== **Finding Functions and Procedures** SELECT name, type from MYSQL.PROCE limit 3; **What’s in a function Procedure** SHOW CREATE FUNCTION add2nums; **Drop a Function or Procedure** DROP FUNCTION add2nums {{ :modules:52553:pdf:topic_3_exercises.pdf |Week 3 Excercises}} ====== Week 4 - Normalisation ====== {{ :modules:52553:pdf:topic_4_-_normalisation.pdf |}}\\ Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. ===== show create table ===== show create table manufacturer; +--------------+-------------------------------------+ | Table | Create Table | +--------------+-------------------------------------+ | manufacturer | CREATE TABLE `manufacturer` ( `manu_code` varchar(3) NOT NULL, `manu_name` varchar(200) NOT NULL, `manu_details` varchar(400) DEFAULT NULL, PRIMARY KEY (`manu_code`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +--------------+-------------------------------------+ 1 row in set (0.00 sec) ===== Get info from multiple tables ===== * INNER JOIN * LEFT JOIN ^INNER JOIN^LEFT JOIN^ |Return rows from two tables only when the JOIN condition is met.|Retrun rows from two tables when the JOIN condition is met.| |If JOIN condition is **not** met, nothing is returned from either table.|If JOIN condition is **not** met, rows from the first (LEFT) table are returned and NULL is returned instead of rows from the second table.| ===== Functions needed for exercises ===== select substring() {{:modules:mysql_substring.png?nolink|}} {{ :modules:52553:pdf:topic_4_exercises.pdf |}}\\ ====== Topic 5 - Insert Update Delete ====== {{:modules:52553:pdf:topic_5_-_insert_update_delete.pdf |}} ===== Insert ===== * CRUD (Create/Insert, Read, Update Delete)\\ https://dev.mysql.com/doc/refman/8.0/en/insert.html * INSERT INTO VALUES (value1, value2, valueN); * INSERT INTO
(column1, column2, column) \\ VALUES (value1, value2, valueN); INSERT INTO person VALUES(1, "John", 23, "M", 1); {{:modules:52553:mysql_insert.png?nolink&400|}} INSERT INTO person (age, sex, isStudent) VALUES (20, "F", 0); https://dev.mysql.com/doc/refman/8.0/en/working-with-null.html ===== Update ===== https://dev.mysql.com/doc/refman/8.0/en/update.html * UPDATE
SET column1 = value1, columnN, valueN; * UPDATE
SET column1 = value1, columnN, valueN \\ WHERE condition; UPDATE Person SET age = 23 WHERE personID = 3; UPDATE person set name = CONCAT(IF(sex="M", "MR.", "Ms.",name)); ===== Delete ===== https://dev.mysql.com/doc/refman/8.0/en/delete.html  * DELETE FROM
;  * DELETE FROM
\\ WHERE condition; DELETE from PERSON WHERE personID = 6; DELETE FROM person WHERE sex = "M" AND isStudent AND age > 20; ===== Foreign Keys ===== Foreign keys can be used to define table behavior when data is deleted. The default behavior for MySQL is ON DELETE RESTRICT, even if not specified. * ON DELETE RESTRICT * ON DELETE CASCADE * ON DELET SET NULL ^MySQL Command^MySQL Behaviour^ ^RESTRICT|Prevents entried from being deleted where a foreign key exists| ^CASCADE|Proceed to delete entries from the table worked on as well as deleting the referenced entries| ^SET NULL|Delete the entries after setting the references to NULL| ===== READ using SubQueries ===== SELECT emp_no, first_name, last_name FROM employees WHERE emp_no IN( SELECT emp_no FROM salaries WHERE salary = ( SELECT MAX(salary) FROM salaries ) ); {{page>52553_mongodb#topic_6_-_mongodb_i&noindent}} {{page>52553_mongodb#topic_7_-_mongodb_ii&noindent}} {{page>52553_python#topic8_-_python&noindent}} ====== Table of Contents ====== {{INLINETOC}}