Difference between revisions of "WebFund 2014W Lecture 20"

From Soma-notes
Jump to navigation Jump to search
(Created page with "Basics of Regular Expressions * start and end: / * . represents any single character * * is 0 or more repeats, + is one or more repeats * Thus .* matches any number of charact...")
 
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The video from the lecture given on March 26, 2014 is available:
* [http://www.screencast.com/t/PTqsRvHFX9Gd Small from screencast.com]
* [http://www.screencast.com/t/i5kHY4YQKzd Large from screencast.com]
* [http://dl.cuol.ca/capture/Anil.Somayaji/COMP_2406_Lecture_20_-_20140326_142525_27.mp4 Original from CUOL]
==Notes==
Basics of Regular Expressions
Basics of Regular Expressions
* start and end: /
* start and end: /
Line 17: Line 26:


Chomsky hierarchy
Chomsky hierarchy
Regular expressions are used for '''input sanitization'''
* make sure externally provided input is "safe"
* two strategies: whitelists and blacklists
** whilelist: explicit list of "good" things
** blacklist: explicit list of "bad" things
* use whitelists when possible
* whitelists are more work because if you leave anything out things break

Latest revision as of 16:28, 26 March 2014

The video from the lecture given on March 26, 2014 is available:


Notes

Basics of Regular Expressions

  • start and end: /
  • . represents any single character
  • * is 0 or more repeats, + is one or more repeats
  • Thus .* matches any number of characters (including none)
  • () denote groups, normally for extraction or later substitution
  • Each group is numbered, so first () is $1 (or something like that)
  • can include letter ranges in [], e.g. [a-z]
  • An all lowercase word with at least one character is: /[a-z]+/
  • | means or (as usual), and is implicit

Apparently there are regular expression decoders online somewhere

Escaped characters

  • \ is used to treat special characters as literals
  • % followed by hex numbers denotes character codes

Chomsky hierarchy

Regular expressions are used for input sanitization

  • make sure externally provided input is "safe"
  • two strategies: whitelists and blacklists
    • whilelist: explicit list of "good" things
    • blacklist: explicit list of "bad" things
  • use whitelists when possible
  • whitelists are more work because if you leave anything out things break