WebFund 2015W Lecture 19

From Soma-notes

Video

The video for the lecture given on Wednesday, March 18, 2015 is now available.

Notes

Assignment due dates

Assignments 5-7 can be submitted late up till Monday March 23 (with a small grade reduction).

Assignment 8 has been moved to Wednesday March 25.

Regular expressions

In order to understand what a regular expression really is, we need a bit of background information.

The Chomsky Hierarchy defines the classes of formal grammars you can create. They are broken into different classifications based on how “powerful” the language is or how much variation it has. A formal grammar is a set of production rules for strings; the rules describe what syntax is valid for the language. A formal grammar has the following things:

  • A finite set of nonterminal symbols indicating that some production rule can yet be applied.
  • A finite set of terminal symbols indicating that no production rule can be applied.
  • A start symbol.

(Formal grammar definition referenced from Wikipedia.)

The Hierarchy breaks the grammars down into the following 4 groups:

  • Type 3 - Regular Expression: Think of it as a a grammar that can be parsed by a little machine with finite states (finite automaton), it can’t remember a large quantity of things.
  • Type 2 - Context Free: They use a stack. This encompasses most programming languages.
  • Type 1 - Context Sensitive: What a computer can do with a fixed amount of space.
  • Type 0 - Recursively Enumerable: Anything a computer can do.

What is a finite automaton? An example of a simple deterministic finite automaton would be a program with 3 states that can receive a 0 or 1. See the Wikipedia example.

  • If we are on state 0 and receive a 0, return to state 0.
  • If we are on state 0 and receive a 1, go to state 1.
  • If we are on state 1 and receive a 0, go to state 2.
  • If we are on state 1 and receive a 1, go to state 0.
  • If we are on state 2 and receive a 0, go to state 1.
  • If we are on state 2 and receive a 1, return to state 2.

Wikipedia says it is a sequence of characters that forms a search pattern. As seen from the Chomsky Hierarchy, we also know that it is a formal grammar.

Formal grammar was initially used in linguistics but adopted by Computer Science. To represent the structure of natural language you parse it using rules - the grammar. Similarly, in Computer Science, a grammar is a set of rules specifying a language, also known as syntax. The simplest languages are ones like LISP (blah(blah blah blah)). Contains lots of parenthesis, symbols, numbers, it is very easy to parse.

The most common programming languages are C-like: Java, C++, JavaScript, Perl(sort of). Not all grammars are the same; some are more powerful or have more variation than others. You will learn this if you take 3803.

While it was more common in the past to parse user input with a formal grammar, we don’t really do this anymore because user input is done with a form so we don’t have to search for the information.

Using regular expressions to parse note contents with links

Do not use regular expressions if you don’t have to, they are hard to make and are very buggy. You will probably not get the expected output as seen in the example below. It works by creating a finite automaton and evaluating the state. Regular expressions have very fast pattern matching.

Incomplete regular expression class example: This example shows us how regular expressions work by trying to parse for links in our notes:

var s = “Example [http://www.google.ca link here]” + “and [http://students.carleton.ca here]
s.replace(/\[(.+) (.+)\]/g, ‘<a href=”$1”>$2</a>;

//s will now be“Example <a href=”http://www.google.ca link here] and [http://students.carleton.ca here]</a>

The string is not exactly what we want, it still has some whitespace and unnecessary text in the link, avoid using regular expressions but understand how Node uses regular expressions to parse URLs in the routing functions. If you use regular expressions expect yours to be wrong, it would be much simpler and less buggy to do it the following way.

The easier way to parse the links would have been to manipulate the string to your desire using:

  • s.indexOf(‘[‘) will return the index of the first open square bracket it sees.
  • s.split(‘[‘) will cut the string into multiple strings at the points where it sees an open square bracket.

Debugging

Code is compiled and checked for syntax before being run. Server side code problems gives error messages to the terminal, client side code problems need to be inspected directly from the browser.

Server Side Code Problems Created in Index.js:

  • Random characters in your code causes a Reference Error: meaning your random characters don’t exist. It would, however, run if it was defined even if it doesn't do anything.
  • Missing a parenthesis gives a Syntax Error: meaning you violated the grammar of JavaScript.

Client Side Code Problems Created in notes.js:

  • Missing a curly brace caused part of the page to be missing. To investigate the problem we inspected the page and went to the Console to see an error message telling us we are missing a ; before statement on line X. The line that is given is usually the end of the program even though the error occurred somewhere else. A good way to debug missing braces is to use an editor that indents based off of braces and just indent lines until we find a discrepancy.

Assignment 8 hints

  • Error 2: Hangs, redirection failed. Redirection happens on the server side.
  • Error 3: Error status 500 means a server side problem.
  • Error 4: Why is Not Logged In appearing? This is not normal input so it is probably encoded somewhere in the program. Find these strings, now look for the error.
  • Error 5: Problem is client side because we made a new note and got a status 200 from the server. It is not rendering the interface for making a new note.
  • Error 6: JavaScript console means it is client side code, $ is used with jQuery.
  • Error 7: Client side not making the right requests
  • Error 8: Client side

Deploying web apps

Deploying Web Apps in 3 steps:

  1. Get a domain name by going to the right registrar – is it .ca. com? This is a database that contains databases. For example Carleton’s Computer Science department contains the following hierarchy .ca -> Carleton -> scs->scs machines.
  2. Get a DNS provider/server to maintain the database for your domain.
  3. Get a hosting provider to have the machine that runs the code.

You can get all these 3 services from one provider; this can be a problem if you don’t like your hosting provider since you’re locked into their services to use your domain. If you purchase the domain from somewhere separate from your hosting provider, it is easy to simply switch hosting providers for your domain.

In Lecture

Regular Expressions

 - comes out of formal language theory, originally from linguistics, but adopted by computer scientists

 - how do you represent the structure of natural languages?
   - how do you "parse" languages?
 - parse using rules

 - in CS, a "grammar" is a set of rules specifying a "language"
   - all syntax, no meaning
   - is this a valid X, where X is...English?  more likely C or Java
 - simplest languages are things like LISP
   (blah (blah blah blah) )
 - most common: C-like syntax
   Java, C++, JavaScript, Perl (sort of)

 - not all grammars are the same
 - some are more "powerful" than others
 - this is the stuff of 3803

Deploying Web Apps
------------------

* get a domain name (DNS)
  - go to the right registrar - is it .ca, .com, .sucks
* get a DNS provider/server
  - maintain DB for your domain
* get a hosting provider (machine to run web code)

You can get all three from certain providers

Don't do your own DNS server
you *can* do your own web server
How?
 - just run it on port 80