User Tools

Site Tools


modules:52553

DATA ANALYTICS REFERENCE DOCUMENT

Document Title:52553 - Applied Database
Document No.:1548347488
Author(s):Rita Raher, Gerhard van der Linde
Contributor(s):

REVISION HISTORY


Revision


Details of Modification(s)

Reason for modification

Date

By
0 Draft release52553 - Applied Database reference page 2019/01/24 16:31 Gerhard van der Linde

52553 - Applied Databases

Module Breakdown

  • 40% continuous assessment
  • 60% for a final project

I'll detail the breakdown of the 40% continuous assessment over the coming weeks in the module.

Week 1 - Introduction

What is data

  • Datum
  • Single piece of information fact or statistic. 
  • Data
    • A series of facts or statistics.
  • Types of Data
    • Non digital information.
    • Digital Information
      • Active Digital Footprint
      • Passive Digital Footprint

Ever increasing data… per minute

  • 120+ new professionals join LinkedIn
  • 456,000 tweets sent
  • 3.6 million Google searches
  • 4.1 million YouTube videos watched
  • 18 million weather forecast requests received

Databases

  • Relational Databases
  • Non-Relational (NoSQL) Databases

Relational Databases

  • A relational database consists of a set of tables used for storing data.
  • A table is collection of related data
  • Each table has a unique name and may relate to one or more other tables in the database through common values.
  • A table in a database is a collection of rows and columns. Tables are also known as entities or relations.
  • A row contains data pertaining to a single item or record in a table. Rows are also known as records or tuples.
  • A column contains data representing a specific characteristic of the records in the table. Columns are also known as fields or attributes.

Spreadsheets

Database Schema

  • A database consists of schemas, tables, views and other objects.
  • A database schema represents the logical configuration of all or part of a database.
  • It defines how the data, and relationships between the data, is stored
  • Two types of Schema:
    • Physical Schema - Defines out how data is stored physically on a storage system in terms of files and indices.
    • Logical Schema - Defines the logical constraints that apply to the stored data, the tables in the database and the relationships between them.

Logical Schema

  • The Logical Schema is designed before the database is created.
  • No data is contained in the logical schema.

Spreadsheets vs Databases

Database Management System (DBMS)

  • A Database Management System (DBMS) is software for creating and managing databases.
  • The DBMS interacts with the user, the database itself, and other systems in order to store, retrieve and process data.
  • The DBMS provides a centralized view of data that can be accessed by multiple users, from multiple locations, in a controlled manner.
  • The DBMS can limit what data the end user sees, as well as how that end user can view the data, providing many views of a single database schema.
  • The DBMS provides data independence, freeing users (and application programs) from knowing where or how the data is stored. Any changes in how or where the data is stored is completely transparent due to the DBMS.
  • CRUD (Create, Read, Update, Delete) functions

DBMS Functions

  • Data Storage Management
  • Security
  • Backup and Recovery
  • Transaction Management
  • Debit Customer a/c 
  • Update Shipping Table 
  • Update Products Table 
  • Credit Store a/c
  • Data integrity
  • Concurrency

Advantages of DBMSs

  • Controlling Redundancy
  • Data Integrity 
  • Enforcement of Standards 
  • Backup and Recovery 
  • Security

Disadvantages of DBMSs

  • Complexity 
  • Size 
  • Performance 
  • Higher impact of failure

Week 2 - Getting Info from Databases

SQL

  • Structured Query Language
  • Standard Relational Database Language
  • SQL is an ANSI/ISO standard, but different databases e.g. MySQL, SQL Server, Oracle may use their own proprietary extensions on top of the standard SQL.

What can SQL do?

CRUD

  • Create a new database 
  • Create tables in a database 
  • Insert data into a database 
  • Read data from a database 
  • Update data in a database 
  • Delete data from a database 
  • Manage transactions 
  • Manage concurrency 
  • Backup and recovery 
  • Manage users

SQL vs MySQL

  • SQL is a language.
  • MySQL is a database management system.

Creating a database

mysql> create database myFirstDatabase;
Query OK, 1 row affected (0.01 sec)
mysql> CREATE dataBase MYFirstDATABASE;
Query OK, 1 row affected (0.01 sec)
mysql> create
    -> database
    -> MyFirstDatabase
    -> ;
Query OK, 1 row affected (0.01 sec)

Using a Database

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| myfirstdatabase    |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)
mysql> use myfirstdatabase;
Database changed

Creating Tables

Car Attributes

  • Make - Varchar(20)
  • Model - Varchar(20)
  • Registration - Varchar(20)
  • Colour - Varchar(20)
  • Mileage - Integer
  • Engine Size - float(2,1)
  • Cylinders
  • Crankshaft
mysql> create table car (
    -> make VARCHAR(20),
    -> model VARCHAR(20),
    -> registration VARCHAR(15),
    -> colour VARCHAR(10),
    -> mileage INTEGER,
    -> enginSize FLOAT(2,1));
Query OK, 0 rows affected (0.09 sec)

Describing Tables

mysql> describe car;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| make         | varchar(20) | YES  |     | NULL    |       |
| model        | varchar(20) | YES  |     | NULL    |       |
| registration | varchar(15) | YES  |     | NULL    |       |
| colour       | varchar(10) | YES  |     | NULL    |       |
| mileage      | int(11)     | YES  |     | NULL    |       |
| enginSize    | float(2,1)  | YES  |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+
6 rows in set (0.02 sec)

Primary key

Primary key - unique values. 1 primary key pair table add personID - int auto_increment flag to increase

Using reg as primary key

reg VARCHAR(15)
PRIMARY KEY(reg)

SELECT

SELECT name FROM person;
 
SELECT * FROM person;

WHERE

SELECT name FROM person where NOT isStudent
 
SELECT name 
 FROM person
WHERE isStudent 
AND sex = “M”;

where >=, <=, Between

SELECT personID, NAME, AGE
FROM person
WHERE age >= 20
AND age <= 39;
 
SELECT personID, NAME, AGE
FROM person
WHERE age between 20 and 39;

Like

SELECT personID, NAME, AGE
FROM person
WHERE name LIKE%a%;
 
SELECT personID, NAME, AGE
FROM person
WHERE name LIKE “_a%;

IN

SELECT personID, NAME, AGE
FROM person
WHERE  age = 12
OR age = 13
OR age = 14
OR age = 15;

AND OR

SELECT personID, NAME, AGE
FROM person
WHERE  age  IN
 (12, 13, 14, 15);
 
SELECT NAME, AGE
FROM person
WHERE  sex =“M”
AND name LIKE “S%OR name LIKE “A%"

Limit

SELECT NAME, AGE
FROM person
WHERE  sex =“M”
AND name LIKE “S%OR name LIKE “A%"
LIMIT 1;
 
 
SELECT NAME, AGE
FROM person
WHERE  sex =“M”
AND name LIKE “S%”
OR name LIKE “A%"
LIMIT 0,3;

DISTINCT

order by name DESC

YEAR() DAY()

SELECT name, age, MONTHNAME(dob)
FROM person
WHERE day(DOB) between 1 and 11
AND name NOT like “A%"  
order by name DESC

Import a database

Week3 - Applied Databases MySQL Functions and Procedures

Functions

  • Mysql can do more than store and retrieve data
  • It can also manipulate the data before storing or retrieving it, via functions.
  • A function is a piece of code that performs some operation and returns a result.
  • Some functions accept parameters, others do not

Built-in functions

String Functions

  • Upper()
    • Returns an uppercase version of a string
  • STRCMP(): Compares two strings and returns:
    • 0 if string 1 - string 2
    • -1 if string 1 < string 2
    • 1 if string 1 > string 2
  • ASCII()
    • Returns the ASCII value of the first character in a string
  • REPLACE(string, from_string, to_string)
    • Replaces all occurrences of s substring within a string, with a new substring
      • string - The original string
      • from_string - The substring to be replaced
      • to_string - The new replacement string
REPLACE(name, “Ms”, “MRS”)
  • SUBSTR(string, start, length)
    • extract a substring from a string
      • string - the start to extract from
      • start - The start position within the string
      • length - The number of character to be extracted
SUBSTR(name, 1, 3)**
  • SQRT(Number)
    • Returns the square root of a number
  • Round(number, decimals)
    • Rounds a number to a specified number of decimal places
ROUND(enginesize)
  • DATEDIFF(date1, date2)
    • returns the number of days between the 2 dates
  • DATE_FORMAT(date, format)
    • formats a date
DATE_FORMAT(dob, "%d-%m-%y”)

Aggregate Functions

An aggregate function performs a calculation on a set of values and returns a single value.

  • AVG()
  • MIN()
  • MAX()
  • SUM()
  • COUNT()

Group By The groupie statement is often used with aggregate functions(COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns

SELECT level, AVG(experience)
FROM teacher GROUP BY level;
 
SELECT ROUND(AVG(mileage)) as “KMs”
FROM car;
 
SELECT model,  ROUND(AVG(mileage)) as “KMs”
FROM car
GROUP BY model;
 
 
SELECT model,  ROUND(AVG(mileage)) as “KMs”
FROM car
WHERE mileage > 60000
GROUP BY model;

HAVING

  • The HAVING clause is often used with the GROUP BY clause to filter groups based on a specified condition.
  • if the GROUP BY clause is omitted, the having clause behaves like the WHERE clause.
  • The HAVING clause applies a filter condition to each group of rows
  • The WHERE clause applies the filter condition to each initial row.
SELECT model,  ROUND(AVG(mileage)) as “KMs”
FROM car
WHERE mileage > 60000
GROUP BY model
HAVING KMs > 250000;

Information Functions

DATABASE()
USER()

select DATABASE();
select USER();

Control Flow Functions

  • IF(condition, value_if_true, value_if_false)
    • condition - value to test
    • value_if_true - Value to return if condition is True
    • value_if_false - Value to return if condition is False
SELECT IF(150>200, “yes”,no)”T/F”;
SELECT *,IF(experience >= 20 AND experience <= 45,Y, “”) as “Payrise Due”
FROM teacher
CASE WHEN condition 1 THEN result 1
	WHEN condition 2 THEN result 1
	WHEN condition n THEN result n 
	ELSE result
END
SELECT name, dob,
CASE
	WHEN month(dob) in (2,3,4) THEN “Spring”
	WHEN month(dob) in (5,6,7) THEN “Summer”
	WHEN month(dob) in (8,9,10) THEN “Autumn”
	WHEN month(dob) in (11,12,1) THEN “Winter”
END as Season
from person;
SELECT name, dob,
CASE
	WHEN month(dob) in (2,3,4) THEN “Spring”
	WHEN month(dob) in (5,6,7) THEN “Summer”
	ELSE"
END as Season
from person;

Stored Routines

A stored routine is user-written code that extends the functionality Mysql.

Uses

  • When multiple client apps are written in different languages or work on different platforms, but need to perform the same database operations.
  • To ensure security. Applications cannot directly access tables only stored routines.

Advantages

  • Speed
    • Performance of applications accessing the database is increased.
    • This is because stored procedures are complied and stored in the database
  • Traffic
    • Instead of sending multiple lengthly SQL statements, the application has to send only the name and parameters of the stored routine.

Disadvantages

  • Complexity
  • Not designed for complex business logic
  • Difficult to debug. Only a few database management systems allow to debug stored procedure. MySQL is not one of them.
  • Performance. A DBMS is not well-designed for logical operations.

MySQL Stored Functions

  • A stored fountain is a special kind stored routine that returns a single value.
  • Stored functions are used to encapsulate common formulas or business logic rules that are reusable among Sql statement or stored routine.
  • A functions take 0 or more input parameters and return a single value

https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

CREATE FUNCTION add2Nums(num1 integer, num2 integer)
RETURNS integer
DETERMINISTIC
BEGINS
	RETURN num1 + num2;
END
 
SELECT add2Nums(3, 10);
CREATE FUNCTION discount(age INT(11))
RETURNS VARCHAR(3)
DETERMINISTIC
BEGIN
	IF age < 16 THEN
		RETURN “0%;
	ELSEIF age < 26 THEN
		RETURN “10%;
	ELSEIF age < 40 THEN
		RETURN “20%;
	ELSEIF age < 60 THEN
		RETURN “30%;
	ELSE 
		RETURN “40%;
	END IF;
END
SELECT name, age, discount(age) “Discount”
FROM person

Stored Procedures

FUNCTIONSPROCEDURES
Return a single valueRetun 0 or more values
Only selectselect, insert, update, delete
Cant use stored procedurescan use stored functions
Does not support transactionsSupport transactions

https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

Create procedure
CREATE PROCEDURE make_mileage(mk VARCHAR(20), ml INT(11))
DETERMINISTIC
BEGIN
	SELECT * FROM CAR
	WHERE make LIKE mk
	AND mileage < ml
	ORDER BY mileage;
END

Call Procedure

call make_mileage(“Toyota”, 200000)
call make_mileage(“Ford”, 5000)

MySQL Routine Management

Finding Functions and Procedures

SELECT name, type from MYSQL.PROCE limit 3;

What’s in a function Procedure

SHOW CREATE FUNCTION add2nums;

Drop a Function or Procedure

DROP FUNCTION add2nums

Week 3 Excercises

Week 4 - Normalisation

topic_4_-_normalisation.pdf

Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy.

show create table

show create table manufacturer;
+--------------+-------------------------------------+
| Table        | Create Table                        |
+--------------+-------------------------------------+
| manufacturer | CREATE TABLE `manufacturer` (
  `manu_code` varchar(3) NOT NULL,
  `manu_name` varchar(200) NOT NULL,
  `manu_details` varchar(400) DEFAULT NULL,
  PRIMARY KEY (`manu_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+--------------+-------------------------------------+
1 row in set (0.00 sec)

Get info from multiple tables

  • INNER JOIN
  • LEFT JOIN
INNER JOINLEFT JOIN
Return rows from two tables only when the JOIN condition is met.Retrun rows from two tables when the JOIN condition is met.
If JOIN condition is not met, nothing is returned from either table.If JOIN condition is not met, rows from the first (LEFT) table are returned and NULL is returned instead of rows from the second table.

Functions needed for exercises

Topic 5 - Insert Update Delete

Insert

INSERT INTO person VALUES(1, "John", 23, "M", 1);

INSERT INTO person (age, sex, isStudent) VALUES (20, "F", 0);

https://dev.mysql.com/doc/refman/8.0/en/working-with-null.html

Update

https://dev.mysql.com/doc/refman/8.0/en/update.html

  • UPDATE <table> SET column1 = value1, columnN, valueN;
  • UPDATE <table> SET column1 = value1, columnN, valueN
    WHERE condition;
UPDATE Person
  SET age = 23
  WHERE personID = 3;
 
UPDATE person
  set name = CONCAT(IF(sex="M", "MR.", "Ms.",name));

Delete

https://dev.mysql.com/doc/refman/8.0/en/delete.html  * DELETE FROM <table>;  * DELETE FROM <table>
WHERE condition;

DELETE from PERSON
  WHERE personID = 6;
 
DELETE FROM person
  WHERE sex = "M"
  AND isStudent
  AND age > 20;

Foreign Keys

Foreign keys can be used to define table behavior when data is deleted. The default behavior for MySQL is ON DELETE RESTRICT, even if not specified.

  • ON DELETE RESTRICT
  • ON DELETE CASCADE
  • ON DELET SET NULL
MySQL CommandMySQL Behaviour
RESTRICTPrevents entried from being deleted where a foreign key exists
CASCADEProceed to delete entries from the table worked on as well as deleting the referenced entries
SET NULLDelete the entries after setting the references to NULL

READ using SubQueries

SELECT emp_no, first_name, last_name
  FROM employees
  WHERE emp_no IN(
    SELECT emp_no
    FROM salaries 
    WHERE salary = (
      SELECT MAX(salary)
      FROM salaries
    )
  );

Topic 6 - MongoDB I

Why NoSQL Databases?

Scalability

Scale Up/Vertically: means moving the database to a bigger server.

Scale Out/Horizontally

Unstructured Data

Add on new features later on like email and then twitter etc…

MongoDB

  • Document Database
  • Schemaless
  • Horizontal Scalability Through sharding1)
  • Duplication of data

JSON

  • JSON - JavaScript Object Notation
  • Lightweight data-interchange format
  • Machine/Human readable
  • Language independent
  • JSON Structure
    • Name/Value pair
    • Ordered Lists

JSON Datatypes

Number

{
  "id" : 1
}
{
  "id" : 3.14
}

Note that there is no distinction between integer and floating point numbers.

String

{
  "id" : 1,
  "fname" : "John"
}

Boolean

{
  "reg" : "09-G-13"
  "hybrid" : false 
}

Array

{
  "student" : "G00257854"
  "subjects" : ["Databases", "Java", "Mobile Apps"]
}

Object Document

{
  "student" : "G00257854"
  "address" : {
    "street" : "Castle Street"
    "town" : "Athenry"
    "county" : "Galway"
  }
}

JSON USES

MongoDB, JSON and BSON

  • JSON object = MongoDB document
  • Internally, MongoDB represents JSON documents in binary-encoded format called BSON (Binary JavaScript Object Notation)
  • BSON extends JSOM model to provide additional data types as well as indexes.

MongoDB Structures

Document FIXME - slide 12….

A document is record in a MongoDB collection and the basic unit of data in MongoDB. Documents are analogous to JSON objects or records in an RDBMS.

Collection

  • A grouping of MongoDB documents.
  • Collections are analogous to RDBMS tables.
  • A collection exists within a single database.
  • Collections do not enforce a schema. Documents within a collection can have different fields.
  • Typically, all documents in a collection have a similar or related purpose.

Database A number of databases can be run on a single MongoDB server.

MongoDB Commands

  • show dbs - Show Databases
  • use myDB - Switch to databases named “myDB” (If it doesn't exist, Mongo creates it)
  • db - Show current Database.
  • show collections - Show collections in the current database

MongoDB Rules for creating a Document

  • Rules for MongoDB documents
    • A document must have an _id field. if one is not provided, it will be automatically generated
    • The _id cannot be an array

Create a document - save()

Query the database - find()

pretty()

$and

$or

$in

Attribute

Attribute and age is greater than 20

findOne()

sort()

MongoDB -_id

  • As previously described, the document ID (_id) attribute of a mongoDB document is the only mandatory part of a document.
  • It can be any value, except an array.

more on save()

insert()

  • Insert a document or documents into a collection.

update()

  • Modifies an existing document or docents in a collection
  • Update (query, update, options)2)

Does not update Mary

$set

deleteOne()

  • Removes a single document from a collection

deleteMany()

Operators

Update Operators

Logical Query Operators

Comparison Query Operators

Topic 7 - MongoDB II

More on find()

db.user.find()

To find only documents that have an email attribute and age is greater than 20

db.user.find({$and:[{email: {$exists:true}}, {age:{$gt:20}}]})

find(query, projection)

db.User.find()

Return only the email attribute of documents where age is greater than 18

db.User.find({age: {$gt: 20}}, {email:1})

Return only the first_name and surname attributes of all documents

db.User.find({}, {_id:false, first_name:1, surname:1})

aggregate()

  • Calculates aggregate values for the data in a collection
  • db.collection.aggregate(pipeline, options)
    • pipeline stages
    • pipeline Operators

Example

Get the average gpa for all students

db.users.aggregate([{$group:{_id:null, Average{$avg:"$gpa"}}}])

$group same as Group by in MYSQL

Result:

Get the Maximum GPA per age group

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}])

To sort: $sort

db.march8.aggregate([{$group:{_id:"$age", "Max GPA per Age":{$max:"$gpa"}}}, {$sort:{_id:1}}])

Indexing

db.user.find()

Return all documents where age is greater than 18

Psuedo code example

for each document d in 'user'{
  if(d.age == 35){
    return d;
  }
}
  • Indexes support the efficient execution of queries in MongoDB.
  • Without indexes, MongoDB must perform a collection scan, i.e scan every document in a collection, to select those documents that match the query statement.
  • Indexes are special data structures that store a small portion of the collection's data set in an easy to traverse form.
  • Indexes hold mappings from field values to document locations.

getIndexes()

  • By default the only index on a document is on the _id field.
  • To find the indexes on a collection:
db.collection.getIndexes()

Which returns information in the following format, detailing the index field (_id) and the order of the indexes(1 is ascending:-1 is descending):

"key":{
 
  "_id":1
}

createIndex()

  • To create an index on a field other than _id:
  • db.collection.createIndex()

db.user.createIndex({age:1})

dropIndex()

  • To drop an index on a field use:
db.collection.dropIndex()
  • To drop the index on the age field we just created use:
db.collection.dropIndex({age:1})
  • Note: The index on _id cannot be dropped

sort()

  • When a sort() is performed on a field that is not an index, MongoDB will sort the results in memory.
  • If the sort() method consumes more than 32MB of memory, MongoDB aborts the sort.
  • To avoid this error, create an index supporting the sort operation.

__Relationships__ in MongoDB

  • Modelling relationships between documents
    • One-to-One Relationships with Embedded Documents
    • One-to-many Relationships with embedded Documents
    • One-to-many relationships with document references

One-to-One relationships with embedded documents

db.student.save({_id:"G00789445", 
                 name: "John", 
                 address:{_id: 100, 
                          town: "Athenry", 
                          county:"Galway"}})
db.student.find({}, {address:1})
{ "_id" : "G00789445",
  "address" : {
    "_id" : 100,
    "town" : "Athenry",
    "county" : "Galway"
  }
}
  • Show only the county field of documents that have an address field.
db.student.find({address:{$exists: true}}, {_id:0, "address.county":1})                 

Note: Observe details in projection part of find, i.e. find(query, projection), {_id:0, “address.county”:1}

{_id:0, “address.county”:1}Meaning
_id:0Do NOT output _id field
“address.county”:1 Only output county field

One-to-Many Relationships with Embedded Documents

Create the document with the relationships

db.student.save({_id:"G00101224", 
                 name:"Mary", 
                 modules:[{_id:"M100", module:"Databases"},
                           _id:"M101", module:"Java"}]})            

Show the student's _id and module of all modules taken by student G00101224

db.student.find({_id:"G00101224"}, {"modules.module":1})          

projection - only show the module of the modules field

{"_id":"G00101224", "modules":[{"module":"Databases"},{"module":"Java"}]}

One-to-Many relationships with document References

In the example the document has only two field, but in reality it can be a very long document with much more information, so it makes sense to use relationships instead.

with referencing

//save the modules to the docs collection
db.docs.save({_id:"M100", module:"Databases"})      
db.docs.save({_id:"M101", module:"Java"})        
//save the students to the docs collection with references to the modules using the module _id fields.
db.docs.save({_id:"G00101224", name:"Mary", modules["M100", "M101"]})
db.docs.save({_id:"G00205266", name:"Sarah", modules["M100", "M101"]})

$lookup

Using the $lookup pipeline3)

Similar to a join in MySQL…

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. The $lookup stage passes these reshaped documents to the next stage.

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}

from - specifies the collection in the same databases to perform the join with. The from collection cannot be sharded.

localField - The value to search for.

foreignField - The field to search for the value specified by localField.

as - The name of the output.

Return all documents including the complete referenced documents

db.docs.aggregate([{$lookup:{from:"docs", localField:"modules", foreignField: "_id", as:"Details"}}])

Embedded Documents vs Referenced Documents

Features of embedded Documents

  • Better performance
  • Atomic

Features of Referenced Documents

  • Slower
  • No repetition
  • More complex relationships

MongoDB vs MySQL

Features of MongoDB

  • Huge amounts of data
  • Unstructured
  • Doesn't really support relationships

Features of MySQL

  • Very Stable
  • Structured
  • Integrity

DATA ANALYTICS REFERENCE DOCUMENT

Document Title:Applied Databases - Python
Document No.:1553629016
Author(s):Gerhard van der Linde, Rita Raher
Contributor(s):

REVISION HISTORY


Revision


Details of Modification(s)

Reason for modification

Date

By
0 Draft releaseApplied Databases - Python 2019/03/26 19:36 Gerhard van der Linde

Topic8 - Python I

Databases vs Program

Employee IDNameDeptSalary
100JohnHR25500
101MaryR&D44500
102BillR&D43000
103TomSales40000
SELECT * from employees WHERE Salary > 42000;
SELECT * from employees WHERE Salary < 30000;

Variables

  • Variables are names areas in the computer's memory that store values.
variables.py
my1stVariable = "Hello World"
my2ndVariable = 1
 
print(my1stVariable)
#Hello World
 
my2ndVariable + 4
 
print(my2ndVariable)
# 1
  • Variables are named areas in the computer's memory that store values.
variables2.py
my2ndVariable = 1
x = my2ndVariable + 4
 
 
print(my2ndVariable)
# 1
print(x)
# 5
 
age = 21
age = age + 1
 
print(age) # 22

IF Statements

ifstatements.py
age = 17
 
if(age > 17):
   print("OK")
 
print(finished)
# finsied
ifstatements1.py
age = 17
 
if(age > 17):
   print("OK")
elif(age < 18):
   print("Nok")
print(finished)
#Nok
# finished
ifstatements2.py
temp = 37 
 
if(temp >37):
   print("Hot")
elif(temp <37):
   print("Cold")
else:
   print("OK")
print("Finished")
# ok
# finished

input

input.py
name = input("Enter name") # Tom
email = name + "@gmit.ie"
print(email)
# Tom@gmit.ie
salary.py
salary = input("Enter salary") # 30000
salary = int(salary)
salary = salary + 100
 
print(salary)

WHILE statement

while.py
i = 1
while(i <=5):
   print(i)
   i+=1
   # i = i +1
# 1  
# 2 
# 3 
# 4 
# 5 
whilebreak.py
answer = "5"
 
while True:
   guess = input("Pick a number between 1 & 10")
   if(guess==answer):
     print("Correct!")
     break
 
print("end")
 

Arrays

array.py
myArr = ["Jan", "Feb", "March", "April"]
print(myArr)
#['Jan', 'Feb', 'March', 'April']
 
print(myArr[0])
# jan
 
print(len(myArr))
#4

Append()

append.py
myArr = ["Jan", "Feb", "March", "April"]
myArr.append("May")
 
print(myArr)
##['Jan', 'Feb', 'March', 'April', "May"]

FOR Statement

forloop.py
name = ["Tom", "John", "Mary", "Bob"]
 
for name in names:
   print(name + "@gmit.ie")
 
# Tom@gmit.ie
# John@gmit.ie
# Mary@gmit.ie
# Bob@gmit.ie
 
 
myArr = [1, 5, 12]
 
for x in myArr:
   print(x+1)
# 2
# 6
# 13
 
print(myArr)
#[1, 5, 12]

User-defined functions

userfunctions.py
def printMonths():
   print("Jan, Feb, Mar")
 
def printDays():
   print("Mon, Tue, Wed")
 
printDays()
# Mon, Tue, Wed
 
printMonths()
# Jan, Feb, Mar

name

userfunctions.py
def printMonths():
   print("Jan, Feb, Mar")
 
def main():
   printMonths()
 
if __name__ =="__main__":
   # execute only if run as a script
   main() 

Parameters

parameters.py
print("Hello World")
# Hello World
 
print("Test")
# Test
 
 
s = "This is a string"
 
print(len(s))
# 16
parameters1.py
def checkAge(age):
   if age < 18:
      return "Too Young"
   return "Accepted"
 
def main():
   name = input("Enter:")
   age = int(input("Enter Age:"))
   print(name, "is", checkAge(age))
 
if __name__="__main__":
   # execute only if run as a script
   main()

Local Variables

A local variable is a variable that is given local scope. Local variable references in the function or block in which it is declared override the same variable name in the larger scope.

localvariables.py
def checkAge(age):
   limit = 18
   if age < limit:
      return "Too Young"
   return "Accepted"
 
def main():
   name = input("Enter:")
   age = int(input("Enter Age:"))
   print(name, "is", checkAge(age), limit)
 
if __name__="__main__":
   # execute only if run as a script
   main()
 
localvariables1.py
def checkAge(age):
   limit = 18
   if age < limit:
      return "Too Young"
   return "Accepted"
 
def main():
   limit= "Finished"
   name = input("Enter:")
   age = int(input("Enter Age:"))
   print(name, "is", checkAge(age), limit)
 
if __name__="__main__":
   # execute only if run as a script
   main()
 

Global Variables

globalvariables.py
def incrementAge(age):
   age += 1
   print(age)
   # 25
 
def main():
   age = 24
   incrementAge(age)
   print(age)
   # 24
 
if __name__="__main__":
   # execute only if run as a script
   main()
 

Declaring the variable outside of the function and using the keword global to make it a global variable

globalvariables.py
age = 24 
 
def incrementAge(age):
   # access using the keyword "Global"
   global age
   age += 1
   print(age)
   # 25
 
 
def main():
   incrementAge(age)
   print(age)
   #25
 
 
if __name__="__main__":
   # execute only if run as a script
   main()
 

Topic9 - Python II

PyMySQL

  • MySQLdb
  • mysql.connector
  • PyMySQL

connect()

  • The connect() function connects to a MySQL database.
  • host - host where the database server is located
  • user - username to log in as
  • password - Password to use
  • db - Database to use
  • port - Port to use
  • cursorclass- Custom cursor class to use

Connecting to the Database

conn = pymysql.connect( "localhost", "root", "root", "school", 
                 cursorclass=pysql.cursors.DictCursor)
conn = pymysql.connect( "localhost", "root", "root", "school", 
                 cursorclass=pysql.cursors.DictCursor,
                 password="root", 
                 host="localhost",
                 db="school",
                 port=3306)

Executing a query

query = "SELECT * FROM subject"
 
with conn: 
   cursor = conn.cursor()
   cursor.execute(query)
   subjects = cursor.fetchall()
   for s in sujects:
      print(s["Name"])
 

query.py
query = "SELECT * FROM subject
         WHERE teacher LIKE %s"
 
with conn:
   cursor = conn.cursor()
   cursor.execute(query, ("Ms.%"))
   subjects = cursor.fetchall()
   for s in subjects:
      print(s["Name"])

Inserting new data

insertquery.py
ins = "Insert INTO subject
       (Name, Teacher, OnLeavingCert)
       VALUE(%s, %s, %s)"
 
with conn:
   cursor = conn.cursor()
   cursor.execute(ins, ("Maths", "Ms.Jones", 1))
   conn.commit()    # commit to database to make a change   

try and except block for error messages

insertquery2.py
ins = "Insert INTO subject
       (Name, Teacher, OnLeavingCert)
       VALUE(%s, %s, %s)"
 
with conn:
   try:
      cursor = conn.cursor()
      cursor.execute(ins, ("Maths", "Ms.Jones", 1))
      conn.commit()   
      print("Insert successful")
   except:
      print("Maths already exists")

Exceptions

name = "Maths"
teacher = "Ms.Jones"
lc =1
with conn:
   try:
      cursor = conn.cursor()
      cursor.execute(query, (name, teacher, lc))
      conn.commit()
      print("Insert Successful")
    except pymysql.err.InternalError as e:
      print("Internal Error", e)
    except pymysql.err.IntegrityError as e:
      print("Error", name, "already exists")
    except Exception as e:
      print("error", e)

Deleting Data

query = "DELETE FROM subject WHERE name = %s"
name = "Maths"
 
with conn:
   try:
      cursor = conn.cursor()
      rowAffected = cursor.execute(query, (name))
      conn.commit()
      if(rowsAffected ==0):
         print("Nothing deleted - ", name , "never existed")
      else: 
         print(rowAffected, "row(s) deleted")
      except Exception as e:
         print("error", e)

Updating Data

query = "UPDATE subject SET teacher = %s WHERE NAME = %s"
subject = "Maths"
newTeacher = "Mr.Murphy"
 
with conn:
   try:
      cursor = conn.cursor()
      rowsAffected = cursor.execute(query, (newTeacher, subject))
      conn.commit()
      if(rowsAffected ==0):
         print(subject, "not updated")
      else:
         print(subject, "now taught by", newTeacher)
   except Exception as e:
      print("error", e)

Installing PyMySQL

  • open command prompt or terminal
  • type conda install pymysql
  • create a new py file


Topic10 - Python III

pymongo

  • client = pymongo.MongoClient()
  • client = pymongo.MongoClient(host=“localhost”, port=“27017”)
  • try
    • client.admin.command('ismaster')

Database and Collections

  • mydb = myclient[“cars”]
  • cols = mydb.list_collection_names()
  • docs = mydb[“docs”]

find()

  • people = docs.find({“age”:{“$gt”:18}})
  • for person in people:
    • print(person[“Name”])

find()

  • people = docs.find({“age”:{“$gt”:18}}, {“_id”:0})
  • people = docs.find({“age”:{“$gt”:18}}, limit=2)

insert_one()

  • newDoc = {“_id”:991, “name”:“John”, “age”:44}
  • mycol.insert_one(doc)

insert_many()

  • newDocs = [{“_id”:991, “name”:“John”, “age”:44},
    • {“_id”:992, “name”:“Mary”, “age”:24},
    • {“_id”:992, “name”:“Mary”, “age”:35}]
  • mycol.insert_many(newDocs)

Exceptions

  • pymongo.errors.ConnectionFailure
  • pymongo.errors.DuplicateKeyError
  • newDocs = [{“_id”:991, “name”:“John”, “age”:44},
    • {“_id”:992, “name”:“Mary”, “age”:24},
    • {“_id”:992, “name”:“Mary”, “age”:35}]
  • mycol.insert_many(newDocs) mycol.insert_many(newDocs, ordered=False)

delete_one()

  • filter = {“age”:{“$gt”:44}}
  • mycol.delete_one(filter)
  • mycol.delete_one({“age”:{“$gt”:44}})

delete_many()

  • filter = {“age”:{“$gt”:44}}
  • result = mycol.delete_many(filter)
  • DeleteResult
  • print(result.deleted_count)

update_one()

  • filter = {“age”:{“$gt”:44}}
  • update = {“$inc”:{“age”:1}}
  • mycol.update_one(filter, update)

update_many()

  • filter = {“age”:{“$gt”:44}}
  • update = {“$inc”:{“age”:1}}
  • result = mycol.update_many(filter, update)
  • UpdateResult
  • print(result.modified_count)

Review

  • MySQL
  • MongoDB
  • Python

Table of Contents

Table of Contents

1)
Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.
modules/52553.txt · Last modified: 2020/06/20 14:39 by 127.0.0.1