WebFund 2024F Lecture 15

From Soma-notes

Video

Video from the lecture for November 7, 2024 is now available:

Notes

Lecture 15
----------
- I am behind on Teams messages & email
  - should be caught up by Saturday afternoon
- Assignment 3 coming out in the next day
  - based on Tutorial 7 code
- note all current tutorials are due next week
  - please get caught up!

Why can't you call await inside of a regular function?
 - regular functions must complete executing, they cannot pause
    - async functions can be paused, with them resuming execution at some arbitrary later time
 - callers of regular functions expect them to complete!
 - so if you want to do something async from a synchronous function,
   you have to use a callback function that will be run once the async
   function completes

the then method is used to resolve returned promises by registering a callback function. The callback function will be called when the promise is fulfilled.


So if you are calling a function or method that is asynchronous (and thus returns a promise always), you can either
 * use await if it is being called inside of an async function
 * use then if it is being called inside of a synchronous function

So why is req.json() asynchronous?
 - we have the request, isn't the body just part of that object?
 - NO, it is potentially still being sent by the client (browser)
    - it could be a big file being uploaded!
 - so the request object can be used to get the body, but it isn't there yet
    - we use methods like .formData() and .json() to get the body data and process it
 - thus, when you're debugging and you look at the request, you won't see
   the request body there - you have to ask for it explicitly instead

Can you use .then on a promise in an async function?
 - yes, but it makes your code unnecessarily complex most of the time
 - instead, just use await and then your code reads sequentially
   rather than having a callback function stuck in the middle of it

JavaScript code, particularly on the server, often got stuck in "callback hell" before the introduction of async/await
 - callbacks nested inside callbacks nested inside callbacks
 - became very hard to follow the flow of control of the code

The .then method makes callbacks cleaner when used with promises, but still not as good as async/await.


Remember that web servers are highly concurrent
 - you have many clients connecting "at once" (multiple requests can be pending)
 - calls to the database can take arbitrary time
 - calls to the filesystem can take arbitrary time

So when this happens you have two choices
 - synchronous: your code pauses until the requested operation finishes, doesn't do anything else
 - asynchronous: your code pauses the particular code waiting for something but other parts of the app can keep running

Deno implements what is known as "cooperative multitasking"
 - different execution contexts pause their work when they are waiting for data but don't block the execution of other code in the app

Compare this to code in standard Java, Python, or C
 - when you open or read a file, or access the network,
   those calls are blocking, meaning
    - the code executes sequentially
    - when making a call that may not complete immediately, the
      rest of the app pauses until that action finishes, e.g.
      a read() from a file pauses execution until the read finishes,
      no other code runs

What if you want your code to do something useful while operations are pending, in standard Java, Python, or C?
 - you use threads or processes
   - i.e., you logically have multiple copies of the program running at the same time
   - each one runs independently
   - with threads, they share memory and so can coordinate that way
   - with processes, they each have exclusive copies of memory, and so
     coordination has to happen in some other way (sockets, files)

with pthread library in C, threads can be implemented in multiple ways
 - kernel & userspace threads, or a mix of the two
 - beyond the scope of this class, but pretty darn complex

But what we've found is that
 1) processes are heavyweight and make coordination hard
 2) threads are a pain because interrupting execution means that
    threads interleave in arbitrary ways, meaning we need locks
    on data structures to prevent corruption
 3) cooperative multitasking is more efficient and easier to program,
    when done with the right abstractions

Deno implements cooperative multitasking with callback functions, promises (and the syntactic sugar of async/await)

This is a complex topic, covered more in operating systems.

But just know
 - in the first web servers (and many other UNIX servers), every incoming request caused a new process to be created, with that new child process handling the incoming request while the parent waited for new incoming connections
 - web servers on windows couldn't do this because process creation was too expensive, so they made a new thread for each incoming request. UNIX servers adopted the same architecture to keep up in performance
 - later it was realized threads were too expensive to create so often,
   so they made "thread pools" of already existing threads that would
   take care of a request and then go back to the pool to wait for new ones
 - node.js demonstrated that cooperative multitasking could be used to make
   simpler and extremely fast web servers. Deno followed the lead
     - was possible because node developed a new async-first approach
       to I/O

Oddly enough, JavaScript was the perfect language for making a cooperative multitasking web server because it originated in the web browser, and so it had no standard facilities for dealing with files or regular network I/O.

- today you can do async I/O (facilitating cooperative multitasking) in Java, Python, and C/C++; however, it is not the default and you end up mixing sync and async code, and that can cause problems without considerable care
 

State and http
 - In general, APIs can be stateful or stateless
 - stateful APIs generally allow for smaller requests and can improve performance through tricks like caching
 - stateless APIs however can allow for greater parallelism
   because any necessary context is included with the request


Classic UNIX file API is stateful
 - you have to open a file, get a file descriptor
 - that file descriptor has to be used for subsequent operations
 - and that file descriptor should be released when no longer needed
   (i.e., closed), although that happens automatically when a process terminates

Why is it stateful?
 - because there's a good bit of work required when finding the data
   associated with a filename
 - using a file descriptor allows this work to amoritized across multiple
   operations (multiple read's and write's)
 - but then we need to tell the OS that we no longer need to access the file
   so it can release the associated resources, that's why files should be closed

http, however, is a stateless protocol
 - every request is independent, GET /list can be understood
   without reference to any other http request
 - this has been very useful for scaling the web
    - because you can do things like replicate web servers
      and any server can handle any incoming request

HOWEVER, in general we need state in a web app (as opposed to just serving static web documents). So what do we do?
 - this is where cookies come in
 - not the only solution, but a very common one

We need some way to tie groups of requests together, indicate they are associated with the same browser/same user

Basic idea
 - server sends some data to the client on first connection
 - on later connections, the client sends this data back to the server
   so the server knows who it is talking to

It is like a "ticket" that is given out and then later can be used to show that you've paid for a service.

Whevever you "log in" to a website, there is state, and that state is maintained using things like cookies.