3 Introduction to SQL

3.1 Overview of the original version of SQL

IBM originally developed SQL, then called Sequel, in 1970s
in 1986 - American National Standards Institute (ANSI) & International Organization for Standarization (ISO) published an SQL standard called SQL-86
- followed by SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2006, SQL:2008, SQL:2011, & most recently SQL:2016

\[\\[.1cm]\]

SQL has several parts:
1. DDL - commands for defining, deleting, & modifying relation schemas
2. DML - provides ability to query info. from database, & to insert, delete, & modify tuples
3. Integrity - commands for specfying integrity constraints that the data must satisfy
4. View definitions - SQL DDL has commands fro defining views
5. Transactions - commands for specfying beginning & end of transactions
6. Embedded SQL & dynamics SQL - defines how SQL can be embedded w/general programming languages
7. Authorizations - SQL DDL includes commands for specifying access rights to views and relations

3.2 SQL Data Defintion

SQL buit-in types:

char(n) - n length character string
varchar(n) - n maximum length varying character string
int/integer - finite subset of integer
smallint - a small integer
numeric(p, d) - a fixed point number w/user-specified precision; p is the number of digits & d is the number of digits after the decimal point
```
     Ex. numeric(3,1) can store 44.5 but NOT 444.5
```
real, double - floating point & double precision floating numbers
float(n) - floating number w/at least n digits
nvarchar - stores multilingual data

When comparing a char type to a varchar type, spaces will be added DEPENDING on the database. Thus a comparision of A (char type) = A (varchar type) may return false, so ALWAYS use varchar.

\[\\[.1cm]\]

create table - command to create a relation
- general form is

where r = name of table, A_i = attribute, & D_i = domain of attribute, type of attribute, & constraints \[\\[.1cm]\]

Query Examples

\[\\[.1cm]\]

drop table r - deletes all info. about table r and removes it from the database
delete from r - deletes all the rows in table r but does NOT delete it from the database
alter table r add A D - adds atribute A of type D to an existing table r & assigns it null values (the default)
alter table r drop A - drop attribute A from table r
- many database systems do NOT allow attributes to be dropped despite allowing tables to be dropped

3.3 Basic structure of SQL queries

Basic structure consists of 3 clauses SELECT, FROM, $ WHERE
SELECT allows duplicates in results
- use SELECT DISTINCT to remove duplicates
- use SELECT ALL to explicitly specify duplicates are NOT removed
- SELECT also allows arthmetic expressions w/+, -, *, & /

\[\\[.1cm]\]

WHERE - specify the requirement/predicate involving attributes
- allows logical connectiveness AND, OR, & NOT
- operands can be expressions w/comparsion operators <, <=, >, >=, =, & <>

\[\\[.1cm]\]

General meaning of a SQL query:
1. generate a cartesian product of the tables listed in the FROM clause
2. apply the predicates specified in the WHERE clause on the result of step 1
3. for each row in the result of step 2, output the attributes specified in the SELECT clause

This is NOT how SQL queries are executed. A real implementation would only generate elements of the cartesion product that satisfy the where clause predicates.

3.4 Additional Basic Operations

as clause - used to rename the attributes of a resulting relation
- can be used w/select & from clause

\[\\[.1cm]\]

correlation name - an identifier that is used to rename a relation
- a.k.a table alias, correlation variable, tuple variable

\[\\[.1cm]\]

String Operators:
- upper(s) - makes string ‘s’ all uppercase
- lower(s) - converts string ‘s’ to all lowercase
- trim(s) - removes the space at the end
- like - operator used for pattern matching \[\\[.1cm]\]
LIKE - used to express patterns
- there are 2 special characters for describing patterns: % and _
- % matches any substring
- _ matches any character
- patterns are case SENSITIVE
- ‘Intro%’ matches any string beginning w/“Intro”
- ‘%Comp%’ mathches any string containing “Comp” as a substring
- ’___’ mathces any string w/exactly 3 characters
- ’___%’ - matches any string of at least 3 characters
```
Ex. "Find all the names of all departments whose building names includes the substring 'Watons'.
SELECT dept_name
FROM department
WHERE building LIKE '%Watson%' 
```

\[\\[.1cm]\]

escape keyword - used immediately before % and _ to indicate that it is to be treated like a normal character

  Ex.
     1. LIKE 'ab\%cd%'escape'\' matches all strings beginning w/"ab%cd" 
     2. LIKE 'ab\\cd%'escape'\' matches all strings beginning w/"ab\cd"

NOT LIKE - used to search for mismatches
* (asterisk) - used w/select clause to mean all attributes
ORDER BY - causes the tuples in the result of a query to appear in a sorted order
- default is in ascending order so use desc for descending order
- can be used on multiple attributes
```
Ex.
SELECT * 
FROM instructor
ORDER BY salary desc, name asc; 
```

\[\\[.1cm]\]

BETWEEN - a comparison operator to simplify the where clause that specify that value is less than or equal to a value and greater than or equal to some value

  Ex.
  Write this:
  SELECT name 
  FROM instructor 
  WHERE salary BETWEEN 90000 and 100000; 

  Instead of this:
  SELECT name 
  FROM instructor 
  WHERE salary <= 100000 AND salary >= 90000;

\[\\[.1cm]\]

ROW CONSTRUCTOR - allows comparison operators to be used on tuples; ($a$₁, $a$₂) <= ($b$₁, $b$₂) is true if $a$₁ <= $b$₁ & $a$₂ <= $b$₂

  Ex. 
  Write this:
  SELECT name, course_id 
  FROM instructor, teaches
  WHERE (instructor.ID, dept_name) = (teaches.ID, 'Biology');

  Instead of this:
  SELECT name, course_id 
  FROM instructor, teaches
  WHERE instructor.ID = teaches.ID and dept_name = 'Biology';

3.5 Set Operations

union - automatically remove duplicates
- use UNION ALL to retain all duplicates

\[\\[.1cm]\]

intersect - tuple in common btw 2 tables; automatically remove duplicates
- use INTERSECT ALL to retain all duplicates
- note that INTERSECT ALL returns the minimum number of duplicates in the 2 tables
```
Ex. So if 4 sections of ICP 101 were taught in Fall 2017 and 2 sections taught in 
Spring 2018, then there would be 2 tuples with ICP 101 in the resulting table 
```
  \[\\[.1cm]\]
except - outputs all tuples from the first table that is not in the second table; perfroms set difference
- use EXCEPT ALL to retain duplicates

EXCEPT ALL returns the number of duplicates in table 1 minus the number of duplicates in table 2 if the difference is POSITIVE.

3.6 Null Values

NULL VALUES - presents problems for arithmetic operations, comparisons, and set operations
the result of any arithmetic expressions (+, -, *, or /) is null if any of the input values is null
the result of any comparison w/null values is unknown
- boolean operators are extended to deal w/unknowns
- and:
  1. result of true and unknown is unknown
  2. false and unknown is false
  3. unknown and unknown is unknown
- or:
  1. true or unknown is true
  2. false or unknown is unknown
  3. unknown or unknown is unknown
- not:
  1. result of not unknown is unknown

\[\\[.1cm]\]

null - a special keyword that can be used in the where clause to test for null values

  Ex. 
  WHERE salary is null 

  Alternatively, 
  WHERE salary is not null -> will select not null values

\[\\[.1cm]\]

is unknown - another special keyword to test whether the result of a comparison is unknown

  Ex. 
  WHERE salary > 100000 is unknown 

  Alternatively, 
  WHERE salary > 100000 is not unknown

unknown & not unknown are not supported by several databases

3.7 Aggregate Functions

aggregate functions take a collection of (a set or multiset) of values as input & returns a single value
- SQL offers 5 standard built-in functions
- avg -> average
- min -> minimum
- max -> maximum
- sum -> total
- count -> count

sum & avg must have numeric inputs, other functions can operate on nonnumeric datatypes

\[\\[.1cm]\]

GROUP BY - forms groups using specified attribute(s)

  Ex. 
  SELECT dept_name, avg(salary) as avg_salary 
  FROM instructors 
  GROUP BY dept_name;

\[\\[.1cm]\]

HAVING clause - applies conditions to each group rather than on a single row
```
  Ex. 
  SELECT dept_name, avg(salary) as avg_salary 
  FROM instructors 
  GROUP BY dept_name
  HAVING avg(salary) > 42000;
```
- sequence of operations for queries w/group by & having clauses:
  1. FROM is evaluated first
  2. if present, WHERE is applied on the resulting table of the from clause
  3. rows satisfying the WHERE clause are placed into groups in the GROUP BY clause
  4. if present, HAVING clause is applied to each group & groups that do not satisfy the HAVING clause are removed
  5. SELECT generates a single tuple for each group by applying the aggregate function

all aggregate functions except count(*) ignore null in their inputs

the count of an empty set is 0, while the result of other functions is null

3.8 Nested Subqueries

subquery - a SELECT-FROM-WHERE expression that is nested within another query
correlated subquery - a subquery that uses a correlation name from an outer query

3.9 Modification of the Databases

DELETE - used to remove whole tuples only
- DELETE cannot remove attributes, must be the entire row
```
General form:
DELETE FROM r
WHERE p; 
```
- p is the predicate & r is the relation
- DELETE first finds all tuples, t in r for which p(t) is true and then deletes it from r
- WHERE can be omitted, in which all rows in r are deleted

DELETE can only operate on one relation

if all rows are deleted, the table still exists but is EMPTY

\[\\[.1cm]\]

INSERT - allows us to add a single tuple or a set of tuples into a relation
- tuples inserted must have the correct num. of attributes

\[\\[.1cm]\]

UPDATE - can change a value in a tuple w/o changing all values

\[\\[.1cm]\]

CASE - allows us to perform multiple updates in a single UPDATE statement, thereby avoiding the problem w/the order of tuples

  Ex. 
  general form:
  UPDATE table
  SET attribute = CASE
                    WHEN pred1 then result1
                    WHEN pred2 then result2
                    ...
                    WHEN predN then resultN
                    else result0
                    end