WebFund 2015W Lecture 12

Video

The video from the lecture given on February 23, 2015 is now available.
Notes

On-Screen Notes

Lecture 12
----------

Midterms to be returned Wednesday

Today: MongoDB and databases


Parts of a web application:
* Front End: client-side code (HTML, CSS, JavaScript)
  - runs in the browser
* Back End: server-side code
  - web server: PHP, Python, Java, *JavaScript* (node)
  - database: MySQL, Oracle, PostgresQL, MongoDB, ...

Early web servers just used a filesystem
 - good for static content
 - bad for concurrent access


Why concurrency?
 - many different web clients trying to modify the "same" resources
 - e.g., passwords stored in a file
 - *really* bad when using multiple web servers

Databases are good at
 - fine-grained data concurrency
 - not as easy to "scale" (cause concurrency is hard)
 - that's SOMEBODY ELSE's PROBLEM

Why not just have web clients talk to the database?
 (cut out the middle man)
 - security: don't want all clients to access ALL data
    - databases do have access control but generally not suitable for most web applications
    - also, data validation
 - scalability
    - databases are relatively slow
    - often stick a caching layer in front/separate out static content


Types of databases
------------------
SQL versus NoSQL databases

Actually, many kinds of NoSQL databases

Difference is *transactions*


SQL => relational databases

Data is divided into tables

tables have columns (fields) and rows (records)

Columns have types (date, string, number, etc)

Table for customers
fields (columns)
 - Customer ID (primary key)
 - name
 - street
 - city
 - province
 - etc.

Table for invoices
 - Invoice ID (primary key)
 - Customer ID
 - what they purchased
 - amount


Relations connect tables

Relations often work best with unique keys
 - fields where every record has a unique value


Scenario
 - creating first invoice for a new customer
 - problem: system crashes after adding the invoice but before adding the customer's info
 - transaction is what makes sure this never happens
   - if just an invoice or a customer change, transaction is not complete and data is thrown away

Transaction:
 - start transaction
 - make your changes
    - add the invoice
    - add the customer
 - commit transaction if all changes succeeded, otherwise abort

Relational databases have *overhead* and *complexty* and *security issues*


NoSQL => no transactions

We're using MongoDB
 - "document store"
 - i.e., JSON store (Technically, BSON)
 - no built-in relations
 - however, MongoDB can run code...JavaScript

In SQL databases
 - fields are pre-defined and relatively rigid

In MongoDB
 - can have any properties in any document
 - no consistency requirement

Indexing


 - records/documents are stored in order of primary keys
    - fast to find by primary key
 - what if I want to find records/documents by something that *isn't* a primary key
    - e.g., invoices by what was purchased, customers by city

basic search
 - if sorted, then you do binary search
 - maybe store so you can index into an arry/hash table
 - otherwise...exhaustive search

Indexing is a way to get around exhaustive search

An index is a data structure associating:
 * a field or fields with
 * the primary key

Index by city in customers
 - sorted list of cities and customer IDs

Why not index on every field?
 - space
 - time of insert/add operation