WebFund 2024F Lecture 7

From Soma-notes

Video

Video from the lecture for October 1, 2024 is now available:

Notes

Lecture 7
---------

We've got lots of languages on the web
 - HTML
 - CSS
 - JavaScript (client side)
 - SQL
 - (server side language, which could be JavaScript or almost anything else)
   (Python, Java, C#, Ruby, JavaScript)

All of these are programming languages
 - Normally we think of a full "programming language" as a language that is Turing complete
    - you can implement a Turing machine in it
    - classic model of computation
 - but really, a programming language is just a language that can be used to give a computer instructions on what to do
    - most config files can be thought of as being written in a programming languge
    - more specialized languages are known as "domain specific languages", or DSLs

There is a tendency for DSLs to become Turing complete, but generally that is a sign they have gotten too fancy

A classic strategy for solving problems in CS is:
 - define a DSL for the problem space
 - create a solution in the DSL

A programming language is a language for solving problems
 - different problems "fit" better with different vocabularies, syntax, semantics

For example, C has atrocious string handling capabilities
 - but you can make it do anything, and people do

Is a markup language a programming language?
 - I would say yes!
 - but generally, it isn't Turing complete and that is okay

Why do we have so many programming languages on the web?
 - because there are many sub-problems to solve!

When you define a level of abstraction, implicitly you define a language
 - you may implement that language using objects or an API, but at its heart it is a language just maybe with bad syntax

Original idea of web tech:

HTML - structure, content of a page
CSS - styling of a page, so a language for talking about the style of HTML
JavaScript - make a page dynamic! (drop down menus, scroll views, etc)

(And this all was served from static pages on the server)


But what if we want the server to do computation, maintain state beyond a set of files?

CGI - common gateway interface
 - early form of dynamic web
 - standard for programs the server would run on request

So, when you requested certain URLs, the server would run a Perl program and return its output rather than return a static page

And then websites became more Perl scripts than files, and so we got Apache keeping the Perl runtime loaded (to make things faster)
 - Apache was the "main" web server application for many years, still widely used
 - name came from "a patchy server", it was a set of patches on an earlier web server

Then we got PHP, a language designed for adding server-side code to HTML documents
 - unlike Perl, PHP was created with the web in mind and was designed to be
   mixed with HTML in the same program


JavaScript was first created as a client side language, but then developers wanted to move code & data between the server and the client
 - so why not use the same language on both ends?
 - also, JavaScript runtimes got really fast, much faster than Python, Perl, and most PHP runtimes
 - that's how we got node.js and then deno



But JavaScript on the server was part of a big change in how web applications were structured

What we've seen so far is that the server sends HTML, CSS, and soon JavaScript to the client, and the client sends back GET and POST requests, the latter with user data.

Today, we more send back and forth objects in the form of JSON
 - GET, POST, other methods are mostly transport for JSON objects

It was a twisty path to get here and we're going to skip over most of the ugly parts (e.g. XML, SOAP mostly)

The early web was mostly one of a LAMP stack
 - Linux
 - Apache
 - MySQL
 - PHP/Perl (earlier more Perl, later more PHP)

So we haven't talked about the third, MySQL, i.e., databases.

MySQL was (initially) a bad but fast implementation of SQL
SQL is an old language of relational databases

Databases are a solution to an eternal problem - how to manage state
 - i.e., how do you manage information that changes over time

Mostly up until now you've dealt with files

What is the API of files?
 - open, read, write, seek, close
 - we mostly open a file, read its contents, and then close it...or
 - we open a file, write stuff to it, then close it

But really, they are an example of a hierarchical key/value store
 - the key is the filename
 - the store is the contents of the file
 - the hierarchy is that of directories and subdirectories

While you can do incremental changes to files, it gets complicated fast, especially if more than one program wants to modify a file
 - typically the last writer will overwrite all previous writers

If you've ever seen an editor complain about a file having changed on disk when you try saving, you're seeing the risks of multiple concurrent editors of a file.

Files are great as long as they are mostly being read, or there is clearly only one writer at a time.

This is NOT the situation with a dynamic web server!
 - every requesting client could potentially require changes to stored data
 - 1000 web browsers all contacting the same web server, sending POST messages
   is entirely realistic

This was a really big problem with early web servers because every incoming request was handled by a new process
 - followed a "fork on request" model
 - later turned into "new thread on request" or "thread from threadpool" model

(Think about this more for COMP 3000)

Don't confuse how we test web apps with how they normally run
 - won't just be on localhost
 - won't just have one person accesing them

SQL (structured query language) is a language designed for interacting with relational databases
 - relational databases were first developed in the 1960's
 - so when the web arose in the 1990's, was already and old and mature tech
 - web developers took it off the shelf and used it for their own purposes

Relational databases were originally created for businesses & govt, storing data on people, money, etc.


Key data structure in relational databases is a table
 - rows and columns
 - columns are fields (categories), e.g. name, address, etc
 - rows are the values of the fields associated with a specific record
    - an individual row is a record

You can think of it as a spreadsheet sheet, and indeed spreadsheets can be accessed as databases
 - but have to have a rigid structure of labeled columns

But where does the "relational" come in?
 - relations are connections between tables
 - an invoice table may have a relation with a customer table
    - the invoice just has a customer ID
    - to get all the info on the customer, look in the customer table
    - thus, invoices won't duplicate customer information

When you retrieve data from a database to build a report, you'll incorporate data from multiple tables using relations to get a coherent view

SQL is the common language for accessing and manipulating relational databases
 - THE standard

So we're going to do VERY BASIC SQL in this course
 - take 3005 to learn more about databases!

Since it is a standard, there are many implementations of SQL, many database engines
 - Oracle
 - IBM DB2
 - MS SQL Server
 - AWS, Azure, Google Cloud have multiple database offerings
 - many other cloud DBs

 - MySQL, MariaDB  <--- forks
 - Postgres (open source successor to Ingres)
    - become the standard API that cloud databases emulate
 - SQLite

SQLite is not like the rest
 - in-process database